Re: Image Data Publication with OMERO

Posted by Vitaly Boyko on
URL: http://confocal-microscopy-list.275.s1.nabble.com/Image-Data-Publication-with-OMERO-tp7584599p7584611.html

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Jason,
I have just gone through both OME and ELIXIR and was a bit lost, as I could not find any conclusions and/or summaries, well defined goals (I see only rather general statements), etc. It still feels a bit "diluted", and not very much focused. Data are usually "endless" - somebody with a broad expertise in both imaging, biology and biophysics has to sort things out and define filters to discard the data and keep only valuable and high quality quantifiable imaging data. Also, data import is still very slow at most locations in 2016. Also, I do not see much funding here or elsewhere for imaging, and very limited funding for analytical work, meaning evaluation of data quality and setting up priorities - let say, imaging biological molecules at their physiological concentrations, measure and monitor protein/RNA/DNA/ligand concentrations over time in a cell or tissue. Or have I missed an imaging meeting or two?
Best regards,
Vitaly   

   

 On Thursday, January 7, 2016 4:19 AM, Jason Swedlow <[hidden email]> wrote:
 

 *****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com/and include the link in your posting.
*****

Dear All-

Thanks for the comments and feedback.  Taking them in reverse order.

Jens Reitdorf:

"with the recent explosion of data volumes through our advances in imaging
automation, light sheet, large volumes, high content screening, volume EM,
multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
all these data to centralized depositories or to the cloud. I think we
should rather think about decentralized storage and ways to remotely access
data subsets."

Exactly true, as this list knows all too well.  A few short points:

1. The scientific value of such repositories, and the cost of running them
(c.f. with George's TBs, PBs, and EBs) aren't yet justified.  I'd argue we
need to run these proofs of concept, and build up the technology and
capabilities until we have a strong case that storing, publishing and
annotating these data has real value.
2.  Capabilities for handling and including decentralised data are
absolutely necessary; this is the basis of a funding proposal we have
already submitted. Building a resource that can perform using this type of
architecture isn't trivial, and takes work.  We do think it is possible to
extend the architecture we already have to achieve this.

Tim Feinstein:

"I think those resources will be most useful if they include descriptive
metadata.  At a quick glance it appears the images have extensive
experimental metadata (siRNA target etc.) but they lack technical details
like microns per pixel, wavelength, camera, objective etc.  I would not
feel comfortable performing a quantitative analysis on this dataset or
comparing it with my own experimental data as it is.  In my experience
OMERO should have no problem gathering at least some of that from raw image
files.  Possibly you can find this info in the methods section of the
original reports, although those rarely include as much as OMERO could
glean."

Great point. One example.  Click on the Sysgro screen, click on any plate,
choose any of the measurement runs, click on any image, look in the right
hand panel of th UI-- the pixel size is there.  If you double-click on any
thumbnail, open an image viewer, and click "Image Information", it is
visible there as well.  Of course, this is the same info, all from the
OMERO API, read by Bio-Formats from the incoming files.

This metadata is there for most (all?) of the screens. Look a bit more
closely and there are issues-- you can compare the metadata visible under
the Acquisition tab in the right hand panel when you view any image in one
study with an image in another study. Why are they different?  It's the age
old problem of different file formats, and thus different submitted
metadata. We didn't have the resources to reconcile metadata differerences
in this project.  However, an outcome of all this is we now have
standardized means of reading all the metadata submitted with these
studies. We are working on packaging these up as scripts and tools anyone
can use to read large, heterogeneous collections of metadata.

George McNamara:

[Sorry, had to summarize]

"With respect to cell biology, how about consolidating all these resources,
and getting BBSRC (as you mentioned), EMBL, Sanger Institute, NIH, HHMI,
and other funders -- Gates Foundation wrt malaria, TB, etc -- , in one
place, such as "The Cell: an Image Library" (which I believe has OMERO
under the hood), http://www.cellimagelibrary.org/(minor detail: that link
is not working from home right now), and eventually a way to move the data
hosting to a Teta/Peta/exa scalable hosts, such as NIH's NCBI and European
and Asian equivalents (or they can pay to have Amazon Web Services, or
Google, Microsoft, etc, equivalents) host it at a lower cost."

As indicated above, we don't think it's possible yet to justify the
creation (read, "funding") of such a resource; therefore we are pursuing
these proofs of concept. There are still lots of things to build, test, and
understand.  In our opinion, the ASCB CELL library (built by Glencoe
Software and ASCB; note a conflict, I founded and run Glencoe Software) was
one such proof of concept, testing ideas around annotation and community
submission.  To our knowledge ASCB CELL is running on OMERO 4.1 (released
Dec 2009) and thus is missing alot of the facilities we have added to OMERO
in the last five years to make it work for TB-scale data publishing.

That being said, the ultimate destination of such resources should be
institutions that are dedicated to support them.  That is why (see the
project description), we have partnered with Alvis Brazma's group at the
EBI, and in particular the BioStudies project he is leading (
http://msb.embopress.org/content/11/12/847). We (OME and the IDR project)
agree that a combination of dedicated, structured repositories and homes
for less structured data will be required for long-term solutions to data
deposition, preservation and access in imaging, and elsewhere.

Short responses to the other comments-- we have worked with Bob Murphy's
group (see http://www.openmicroscopy.org/site/products/partner) and others.
You are correct that the resource we now have and hope to continue to grow
would be a great way to test concepts in image similarity-- is it possible
to draw computational similarities between, for example, the microtubule
arrays of S. pombe and U2OS cells?

Funding-- yes, there are lots of opportunities. This is an open source
project, which we aim to continue to drive and grow. We welcome
collaboration and participation.

Thanks again-- we appreciate all comments and feedback.

Cheers,

Jason

On Wed, Jan 6, 2016 at 7:17 PM, jens rietdorf <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com/and include the link in your posting.
> *****
>
> Dear colleagues,
>
> with the recent explosion of data volumes through our advances in imaging
> automation, light sheet, large volumes, high content screening, volume EM,
> multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
> all these data to centralized depositories or to the cloud. I think we
> should rather think about decentralized storage and ways to remotely access
> data subsets. We have machines running that produce Gigabytes per second,
> and Terabytes per experiment. Wuth a total of 37TB as a test dataset we
> will not be able to address these challenges.
>
> just my 2 cents
> Abraços,
> Jens
>
> Dr. Jens Rietdorf, visiting scientist @ center for technological
> development in health (CDTS), Rio de Janeiro, Brasil.
>



--
**************************
Centre for Gene Regulation & Expression
School of Life Sciences
University of Dundee
Dundee  DD1 5EH
United Kingdom

phone (01382) 385819
Intl phone:  44 1382 385819
FAX  (01382) 388072
email: [hidden email]

Lab Page: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
Open Microscopy Environment: http://openmicroscopy.org/
**************************