Image Data Publication with OMERO

classic Classic list List threaded Threaded
6 messages Options
Jason Swedlow-2 Jason Swedlow-2
Reply | Threaded
Open this post in threaded view
|

Image Data Publication with OMERO

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear All-

We’ve previously discussed data publication and release on these lists.  To
follow up on this, we wanted to mentioned work OME has been doing on a
BBSRC-funded project, a public Image Data Repository.

The aim is to develop next proof-of-concepts for large-scale data
publication.  There are already several existing public resources using
OMERO to publish image data at scale (e.g., JCB DataViewer
<http://jcb-dataviewer.rupress.org/>, LINCS Database
<http://lincs.hms.harvard.edu/>, Stowers ODR
<http://www.stowers.org/research/publications/odr>, EMDataBank
<http://www.emdatabank.org/>, SSBD <http://ssbd.qbic.riken.jp/>, MovinCell
<http://movincell.org/>, IMPC <http://www.mousephenotype.org/>).  In the
current project, we want to demonstrate the value of aggregating several
large imaging studies into one resource and drawing as many links as
possible between them— at the level of perturbations, ontological
annotations and quantitative features.  Finally, we want to test the
feasibility of making these data available in a virtual resource that users
can use to test their own algorithms and datasets.

[A fuller description of the project is at
http://www.bbsrc.ac.uk/research/grants/grants/AwardDetails.aspx?FundingReference=BB/M018423/1.]


These are obviously quite ambitious goals, but we are making some
progress.  We’d like to show you all where we are, ask for comments or
ideas,  and maybe stimulate your own ideas how you can contribute the
project or use these resources— both data and software-- for your own work.

The IDR resources are now up and available at:

http://idr-demo.openmicroscopy.org
Data from 12 genetic, siRNA, chemical or geographic studies, across 16
screens from a series of published papers.  This link includes several
screens, and pointers to:
http://idr-demo.openmicroscopy.org/mito Data (images and metadata) from the
Mitocheck screen (http://mitosys.org)
http://idr-demo.openmicroscopy.org/tara Data (images and metadata) from the
Tara Oceans study (http://oceans.taraexpeditions.org/en/)
http://idr-demo.openmicroscopy.org/pgpc  Data from recent paper from
Breinig et al (http://msb.embopress.org/content/11/12/846)

These studies are available from different URLs as this helped us during
the data loading phase.

In total, there are:
* 12 studies, 16 screens
* 2500 plates
* 29M frames
* 37TB of image data

We are continuing to add more datasets to this resource.

All of these are available via the “vanilla” OMERO web interface.  There is
no question that this interface isn’t built for this type of use case, and
the datasets “stretch” and in some cases break the design of this UI.
We’ll be doing work on the UI in due course.  Our goal was to get the data
integration done and have a foundation for designing, testing and building
a virtual analysis resource (which is the next phase of our work).

Again, any thoughts or comments welcome. Obviously if you know of datasets
that would be candidates for this resource, please do let us know. The
principles for datasets we want to include are written up in the
Euro-BioImaging/Elixir Data Strategy (
http://www.eurobioimaging.eu/content-news/euro-bioimaging-elixir-image-data-strategy
).

Thanks again for your support and comments.

Cheers,

Jason

--
**************************
Centre for Gene Regulation & Expression
School of Life Sciences
University of Dundee
Dundee  DD1 5EH
United Kingdom

phone (01382) 385819
Intl phone:  44 1382 385819
FAX   (01382) 388072
email: [hidden email]

Lab Page: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
Open Microscopy Environment: http://openmicroscopy.org
**************************
George McNamara George McNamara
Reply | Threaded
Open this post in threaded view
|

Re: Image Data Publication with OMERO

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Jason,

With respect to cell biology, how about consolidating all these
resources, and getting BBSRC (as you mentioned), EMBL, Sanger Institute,
NIH, HHMI, and other funders -- Gates Foundation wrt malaria, TB, etc --
, in one place, such as "The Cell: an Image Library" (which I believe
has OMERO under the hood),
http://www.cellimagelibrary.org/
(minor detail: that link is not working from home right now),

and eventually a way to move the data hosting to a Teta/Peta/exa
scalable hosts, such as NIH's NCBI and European and Asian equivalents
(or they can pay to have Amazon Web Services, or Google, Microsoft, etc,
equivalents) host it at a lower cost. The

Big Data to Knowledge
http://commonfund.nih.gov/bd2k/index
https://commonfund.nih.gov/bd2k/grants

has money - and hopefully current RFA - that might help.

Have you spoken /met with Philip Bourne, NIH's Associate Director for
Data Science, https://commonfund.nih.gov/bd2k/members
(Philip knows big data - was director of the Protein Data Bank (now at
http://www.rcsb.org/pdb/home/home.do ) before joining NIH.

 From http://murphylab.web.cmu.edu/software/searcher/   it looks like
OMERO runs "under the hood" of www.ProteinAtlas.org and that Robert
Murphy has already figured out how to do analysis from both The Cell
library and Human Protein Atlas. Can you 'simply' scale up from Robert's
work?

best wishes,

George
p.s. Listserv members - most light microscopy web sites have boring
1980's style HeLa DAPI/F-actin/microtubules images - can be more
colorful than that I'm a little behind on my reading and only last week
came across "ColorfulCell" a mammalian expression plasmid to express
fluorescent proteins, $65 from addgene,

*MXS-Chaining: A Highly Efficient Cloning Platform for Imaging and Flow
Cytometry Approaches in Mammalian Systems*. Sladitschek HL, Neveu
PA./PLoS One. 2015 Apr 24;10(4):e0124958. doi:
10.1371/journal.pone.0124958. eCollection 2015./PONE-D-15-01385
[pii]PubMed 25909630 <http://pubmed.org/25909630>

    $65 ColorfulCell           https://www.addgene.org/62449/
$375 MXS Chaining Kit  https://www.addgene.org/kits/mxs-chaining/
           More stuff from Pierre Neveu's lab,
https://www.addgene.org/Pierre_Neveu/

It would be useful if all the best FP's from The Michael Davidson
Collection were made into MXS components, and other state of the art
FP's added to addgene and MXS. I've previously suggested additional ways
to improve FP's SNR by localization, http://works.bepress.com/gmcnamara/75


On 1/6/2016 3:51 AM, Jason Swedlow wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Dear All-
>
> We’ve previously discussed data publication and release on these lists.  To
> follow up on this, we wanted to mentioned work OME has been doing on a
> BBSRC-funded project, a public Image Data Repository.
>
> The aim is to develop next proof-of-concepts for large-scale data
> publication.  There are already several existing public resources using
> OMERO to publish image data at scale (e.g., JCB DataViewer
> <http://jcb-dataviewer.rupress.org/>, LINCS Database
> <http://lincs.hms.harvard.edu/>, Stowers ODR
> <http://www.stowers.org/research/publications/odr>, EMDataBank
> <http://www.emdatabank.org/>, SSBD <http://ssbd.qbic.riken.jp/>, MovinCell
> <http://movincell.org/>, IMPC <http://www.mousephenotype.org/>).  In the
> current project, we want to demonstrate the value of aggregating several
> large imaging studies into one resource and drawing as many links as
> possible between them— at the level of perturbations, ontological
> annotations and quantitative features.  Finally, we want to test the
> feasibility of making these data available in a virtual resource that users
> can use to test their own algorithms and datasets.
>
> [A fuller description of the project is at
> http://www.bbsrc.ac.uk/research/grants/grants/AwardDetails.aspx?FundingReference=BB/M018423/1.]
>
>
> These are obviously quite ambitious goals, but we are making some
> progress.  We’d like to show you all where we are, ask for comments or
> ideas,  and maybe stimulate your own ideas how you can contribute the
> project or use these resources— both data and software-- for your own work.
>
> The IDR resources are now up and available at:
>
> http://idr-demo.openmicroscopy.org
> Data from 12 genetic, siRNA, chemical or geographic studies, across 16
> screens from a series of published papers.  This link includes several
> screens, and pointers to:
> http://idr-demo.openmicroscopy.org/mito Data (images and metadata) from the
> Mitocheck screen (http://mitosys.org)
> http://idr-demo.openmicroscopy.org/tara Data (images and metadata) from the
> Tara Oceans study (http://oceans.taraexpeditions.org/en/)
> http://idr-demo.openmicroscopy.org/pgpc  Data from recent paper from
> Breinig et al (http://msb.embopress.org/content/11/12/846)
>
> These studies are available from different URLs as this helped us during
> the data loading phase.
>
> In total, there are:
> * 12 studies, 16 screens
> * 2500 plates
> * 29M frames
> * 37TB of image data
>
> We are continuing to add more datasets to this resource.
>
> All of these are available via the “vanilla” OMERO web interface.  There is
> no question that this interface isn’t built for this type of use case, and
> the datasets “stretch” and in some cases break the design of this UI.
> We’ll be doing work on the UI in due course.  Our goal was to get the data
> integration done and have a foundation for designing, testing and building
> a virtual analysis resource (which is the next phase of our work).
>
> Again, any thoughts or comments welcome. Obviously if you know of datasets
> that would be candidates for this resource, please do let us know. The
> principles for datasets we want to include are written up in the
> Euro-BioImaging/Elixir Data Strategy (
> http://www.eurobioimaging.eu/content-news/euro-bioimaging-elixir-image-data-strategy
> ).
>
> Thanks again for your support and comments.
>
> Cheers,
>
> Jason
>


--



George McNamara, Ph.D.
Single Cells Analyst, T-Cell Therapy Lab (Cooper Lab)
University of Texas M.D. Anderson Cancer Center
Houston, TX 77054
Tattletales http://works.bepress.com/gmcnamara/42
http://works.bepress.com/gmcnamara/75
https://www.linkedin.com/in/georgemcnamara
Tim Feinstein Tim Feinstein
Reply | Threaded
Open this post in threaded view
|

Re: Image Data Publication with OMERO

In reply to this post by Jason Swedlow-2
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Jason,

Very cool project.  I think those resources will be most useful if they
include descriptive metadata.  At a quick glance it appears the images
have extensive experimental metadata (siRNA target etc.) but they lack
technical details like microns per pixel, wavelength, camera, objective
etc.  I would not feel comfortable performing a quantitative analysis on
this dataset or comparing it with my own experimental data as it is.  In
my experience OMERO should have no problem gathering at least some of that
from raw image files.  Possibly you can find this info in the methods
section of the original reports, although those rarely include as much as
OMERO could glean.

 If it is not too much of a burden, possibly you could suggest
experimenters fill out at least the basics when processed images are
uploaded lacking their original metadata. These high-throughput studies
often use the same settings across the board so it may not be that hard to
update them all at once.

All the best,


Tim


Timothy Feinstein, Ph.D.
Research Scientist
University of Pittsburgh Department of Developmental Biology





On 1/6/16, 4:51 AM, "Confocal Microscopy List on behalf of Jason Swedlow"
<[hidden email] on behalf of [hidden email]> wrote:

>*****
>To join, leave or search the confocal microscopy listserv, go to:
>http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>Post images on http://www.imgur.com and include the link in your posting.
>*****
>
>Dear All-
>
>We¹ve previously discussed data publication and release on these lists.
>To
>follow up on this, we wanted to mentioned work OME has been doing on a
>BBSRC-funded project, a public Image Data Repository.
>
>The aim is to develop next proof-of-concepts for large-scale data
>publication.  There are already several existing public resources using
>OMERO to publish image data at scale (e.g., JCB DataViewer
><http://jcb-dataviewer.rupress.org/>, LINCS Database
><http://lincs.hms.harvard.edu/>, Stowers ODR
><http://www.stowers.org/research/publications/odr>, EMDataBank
><http://www.emdatabank.org/>, SSBD <http://ssbd.qbic.riken.jp/>, MovinCell
><http://movincell.org/>, IMPC <http://www.mousephenotype.org/>).  In the
>current project, we want to demonstrate the value of aggregating several
>large imaging studies into one resource and drawing as many links as
>possible between them‹ at the level of perturbations, ontological
>annotations and quantitative features.  Finally, we want to test the
>feasibility of making these data available in a virtual resource that
>users
>can use to test their own algorithms and datasets.
>
>[A fuller description of the project is at
>http://www.bbsrc.ac.uk/research/grants/grants/AwardDetails.aspx?FundingRef
>erence=BB/M018423/1.]
>
>
>These are obviously quite ambitious goals, but we are making some
>progress.  We¹d like to show you all where we are, ask for comments or
>ideas,  and maybe stimulate your own ideas how you can contribute the
>project or use these resources‹ both data and software-- for your own
>work.
>
>The IDR resources are now up and available at:
>
>http://idr-demo.openmicroscopy.org
>Data from 12 genetic, siRNA, chemical or geographic studies, across 16
>screens from a series of published papers.  This link includes several
>screens, and pointers to:
>http://idr-demo.openmicroscopy.org/mito Data (images and metadata) from
>the
>Mitocheck screen (http://mitosys.org)
>http://idr-demo.openmicroscopy.org/tara Data (images and metadata) from
>the
>Tara Oceans study (http://oceans.taraexpeditions.org/en/)
>http://idr-demo.openmicroscopy.org/pgpc  Data from recent paper from
>Breinig et al (http://msb.embopress.org/content/11/12/846)
>
>These studies are available from different URLs as this helped us during
>the data loading phase.
>
>In total, there are:
>* 12 studies, 16 screens
>* 2500 plates
>* 29M frames
>* 37TB of image data
>
>We are continuing to add more datasets to this resource.
>
>All of these are available via the ³vanilla² OMERO web interface.  There
>is
>no question that this interface isn¹t built for this type of use case, and
>the datasets ³stretch² and in some cases break the design of this UI.
>We¹ll be doing work on the UI in due course.  Our goal was to get the data
>integration done and have a foundation for designing, testing and building
>a virtual analysis resource (which is the next phase of our work).
>
>Again, any thoughts or comments welcome. Obviously if you know of datasets
>that would be candidates for this resource, please do let us know. The
>principles for datasets we want to include are written up in the
>Euro-BioImaging/Elixir Data Strategy (
>http://www.eurobioimaging.eu/content-news/euro-bioimaging-elixir-image-dat
>a-strategy
>).
>
>Thanks again for your support and comments.
>
>Cheers,
>
>Jason
>
>--
>**************************
>Centre for Gene Regulation & Expression
>School of Life Sciences
>University of Dundee
>Dundee  DD1 5EH
>United Kingdom
>
>phone (01382) 385819
>Intl phone:  44 1382 385819
>FAX   (01382) 388072
>email: [hidden email]
>
>Lab Page: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
>Open Microscopy Environment: http://openmicroscopy.org
>**************************
jerie jerie
Reply | Threaded
Open this post in threaded view
|

Re: Image Data Publication with OMERO

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear colleagues,

with the recent explosion of data volumes through our advances in imaging
automation, light sheet, large volumes, high content screening, volume EM,
multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
all these data to centralized depositories or to the cloud. I think we
should rather think about decentralized storage and ways to remotely access
data subsets. We have machines running that produce Gigabytes per second,
and Terabytes per experiment. Wuth a total of 37TB as a test dataset we
will not be able to address these challenges.

just my 2 cents
Abraços,
Jens

Dr. Jens Rietdorf, visiting scientist @ center for technological
development in health (CDTS), Rio de Janeiro, Brasil.
Jason Swedlow-2 Jason Swedlow-2
Reply | Threaded
Open this post in threaded view
|

Re: Image Data Publication with OMERO

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear All-

Thanks for the comments and feedback.  Taking them in reverse order.

Jens Reitdorf:

"with the recent explosion of data volumes through our advances in imaging
automation, light sheet, large volumes, high content screening, volume EM,
multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
all these data to centralized depositories or to the cloud. I think we
should rather think about decentralized storage and ways to remotely access
data subsets."

Exactly true, as this list knows all too well.  A few short points:

1. The scientific value of such repositories, and the cost of running them
(c.f. with George's TBs, PBs, and EBs) aren't yet justified.  I'd argue we
need to run these proofs of concept, and build up the technology and
capabilities until we have a strong case that storing, publishing and
annotating these data has real value.
2.  Capabilities for handling and including decentralised data are
absolutely necessary; this is the basis of a funding proposal we have
already submitted. Building a resource that can perform using this type of
architecture isn't trivial, and takes work.  We do think it is possible to
extend the architecture we already have to achieve this.

Tim Feinstein:

"I think those resources will be most useful if they include descriptive
metadata.  At a quick glance it appears the images have extensive
experimental metadata (siRNA target etc.) but they lack technical details
like microns per pixel, wavelength, camera, objective etc.  I would not
feel comfortable performing a quantitative analysis on this dataset or
comparing it with my own experimental data as it is.  In my experience
OMERO should have no problem gathering at least some of that from raw image
files.  Possibly you can find this info in the methods section of the
original reports, although those rarely include as much as OMERO could
glean."

Great point. One example.  Click on the Sysgro screen, click on any plate,
choose any of the measurement runs, click on any image, look in the right
hand panel of th UI-- the pixel size is there.  If you double-click on any
thumbnail, open an image viewer, and click "Image Information", it is
visible there as well.  Of course, this is the same info, all from the
OMERO API, read by Bio-Formats from the incoming files.

This metadata is there for most (all?) of the screens. Look a bit more
closely and there are issues-- you can compare the metadata visible under
the Acquisition tab in the right hand panel when you view any image in one
study with an image in another study. Why are they different?  It's the age
old problem of different file formats, and thus different submitted
metadata. We didn't have the resources to reconcile metadata differerences
in this project.  However, an outcome of all this is we now have
standardized means of reading all the metadata submitted with these
studies. We are working on packaging these up as scripts and tools anyone
can use to read large, heterogeneous collections of metadata.

George McNamara:

[Sorry, had to summarize]

"With respect to cell biology, how about consolidating all these resources,
and getting BBSRC (as you mentioned), EMBL, Sanger Institute, NIH, HHMI,
and other funders -- Gates Foundation wrt malaria, TB, etc -- , in one
place, such as "The Cell: an Image Library" (which I believe has OMERO
under the hood), http://www.cellimagelibrary.org/ (minor detail: that link
is not working from home right now), and eventually a way to move the data
hosting to a Teta/Peta/exa scalable hosts, such as NIH's NCBI and European
and Asian equivalents (or they can pay to have Amazon Web Services, or
Google, Microsoft, etc, equivalents) host it at a lower cost."

As indicated above, we don't think it's possible yet to justify the
creation (read, "funding") of such a resource; therefore we are pursuing
these proofs of concept. There are still lots of things to build, test, and
understand.  In our opinion, the ASCB CELL library (built by Glencoe
Software and ASCB; note a conflict, I founded and run Glencoe Software) was
one such proof of concept, testing ideas around annotation and community
submission.  To our knowledge ASCB CELL is running on OMERO 4.1 (released
Dec 2009) and thus is missing alot of the facilities we have added to OMERO
in the last five years to make it work for TB-scale data publishing.

That being said, the ultimate destination of such resources should be
institutions that are dedicated to support them.  That is why (see the
project description), we have partnered with Alvis Brazma's group at the
EBI, and in particular the BioStudies project he is leading (
http://msb.embopress.org/content/11/12/847). We (OME and the IDR project)
agree that a combination of dedicated, structured repositories and homes
for less structured data will be required for long-term solutions to data
deposition, preservation and access in imaging, and elsewhere.

Short responses to the other comments-- we have worked with Bob Murphy's
group (see http://www.openmicroscopy.org/site/products/partner) and others.
You are correct that the resource we now have and hope to continue to grow
would be a great way to test concepts in image similarity-- is it possible
to draw computational similarities between, for example, the microtubule
arrays of S. pombe and U2OS cells?

Funding-- yes, there are lots of opportunities. This is an open source
project, which we aim to continue to drive and grow. We welcome
collaboration and participation.

Thanks again-- we appreciate all comments and feedback.

Cheers,

Jason

On Wed, Jan 6, 2016 at 7:17 PM, jens rietdorf <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Dear colleagues,
>
> with the recent explosion of data volumes through our advances in imaging
> automation, light sheet, large volumes, high content screening, volume EM,
> multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
> all these data to centralized depositories or to the cloud. I think we
> should rather think about decentralized storage and ways to remotely access
> data subsets. We have machines running that produce Gigabytes per second,
> and Terabytes per experiment. Wuth a total of 37TB as a test dataset we
> will not be able to address these challenges.
>
> just my 2 cents
> Abraços,
> Jens
>
> Dr. Jens Rietdorf, visiting scientist @ center for technological
> development in health (CDTS), Rio de Janeiro, Brasil.
>



--
**************************
Centre for Gene Regulation & Expression
School of Life Sciences
University of Dundee
Dundee  DD1 5EH
United Kingdom

phone (01382) 385819
Intl phone:  44 1382 385819
FAX   (01382) 388072
email: [hidden email]

Lab Page: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
Open Microscopy Environment: http://openmicroscopy.org
**************************
Vitaly Boyko Vitaly Boyko
Reply | Threaded
Open this post in threaded view
|

Re: Image Data Publication with OMERO

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Jason,
I have just gone through both OME and ELIXIR and was a bit lost, as I could not find any conclusions and/or summaries, well defined goals (I see only rather general statements), etc. It still feels a bit "diluted", and not very much focused. Data are usually "endless" - somebody with a broad expertise in both imaging, biology and biophysics has to sort things out and define filters to discard the data and keep only valuable and high quality quantifiable imaging data. Also, data import is still very slow at most locations in 2016. Also, I do not see much funding here or elsewhere for imaging, and very limited funding for analytical work, meaning evaluation of data quality and setting up priorities - let say, imaging biological molecules at their physiological concentrations, measure and monitor protein/RNA/DNA/ligand concentrations over time in a cell or tissue. Or have I missed an imaging meeting or two?
Best regards,
Vitaly   

   

 On Thursday, January 7, 2016 4:19 AM, Jason Swedlow <[hidden email]> wrote:
 

 *****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com/and include the link in your posting.
*****

Dear All-

Thanks for the comments and feedback.  Taking them in reverse order.

Jens Reitdorf:

"with the recent explosion of data volumes through our advances in imaging
automation, light sheet, large volumes, high content screening, volume EM,
multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
all these data to centralized depositories or to the cloud. I think we
should rather think about decentralized storage and ways to remotely access
data subsets."

Exactly true, as this list knows all too well.  A few short points:

1. The scientific value of such repositories, and the cost of running them
(c.f. with George's TBs, PBs, and EBs) aren't yet justified.  I'd argue we
need to run these proofs of concept, and build up the technology and
capabilities until we have a strong case that storing, publishing and
annotating these data has real value.
2.  Capabilities for handling and including decentralised data are
absolutely necessary; this is the basis of a funding proposal we have
already submitted. Building a resource that can perform using this type of
architecture isn't trivial, and takes work.  We do think it is possible to
extend the architecture we already have to achieve this.

Tim Feinstein:

"I think those resources will be most useful if they include descriptive
metadata.  At a quick glance it appears the images have extensive
experimental metadata (siRNA target etc.) but they lack technical details
like microns per pixel, wavelength, camera, objective etc.  I would not
feel comfortable performing a quantitative analysis on this dataset or
comparing it with my own experimental data as it is.  In my experience
OMERO should have no problem gathering at least some of that from raw image
files.  Possibly you can find this info in the methods section of the
original reports, although those rarely include as much as OMERO could
glean."

Great point. One example.  Click on the Sysgro screen, click on any plate,
choose any of the measurement runs, click on any image, look in the right
hand panel of th UI-- the pixel size is there.  If you double-click on any
thumbnail, open an image viewer, and click "Image Information", it is
visible there as well.  Of course, this is the same info, all from the
OMERO API, read by Bio-Formats from the incoming files.

This metadata is there for most (all?) of the screens. Look a bit more
closely and there are issues-- you can compare the metadata visible under
the Acquisition tab in the right hand panel when you view any image in one
study with an image in another study. Why are they different?  It's the age
old problem of different file formats, and thus different submitted
metadata. We didn't have the resources to reconcile metadata differerences
in this project.  However, an outcome of all this is we now have
standardized means of reading all the metadata submitted with these
studies. We are working on packaging these up as scripts and tools anyone
can use to read large, heterogeneous collections of metadata.

George McNamara:

[Sorry, had to summarize]

"With respect to cell biology, how about consolidating all these resources,
and getting BBSRC (as you mentioned), EMBL, Sanger Institute, NIH, HHMI,
and other funders -- Gates Foundation wrt malaria, TB, etc -- , in one
place, such as "The Cell: an Image Library" (which I believe has OMERO
under the hood), http://www.cellimagelibrary.org/(minor detail: that link
is not working from home right now), and eventually a way to move the data
hosting to a Teta/Peta/exa scalable hosts, such as NIH's NCBI and European
and Asian equivalents (or they can pay to have Amazon Web Services, or
Google, Microsoft, etc, equivalents) host it at a lower cost."

As indicated above, we don't think it's possible yet to justify the
creation (read, "funding") of such a resource; therefore we are pursuing
these proofs of concept. There are still lots of things to build, test, and
understand.  In our opinion, the ASCB CELL library (built by Glencoe
Software and ASCB; note a conflict, I founded and run Glencoe Software) was
one such proof of concept, testing ideas around annotation and community
submission.  To our knowledge ASCB CELL is running on OMERO 4.1 (released
Dec 2009) and thus is missing alot of the facilities we have added to OMERO
in the last five years to make it work for TB-scale data publishing.

That being said, the ultimate destination of such resources should be
institutions that are dedicated to support them.  That is why (see the
project description), we have partnered with Alvis Brazma's group at the
EBI, and in particular the BioStudies project he is leading (
http://msb.embopress.org/content/11/12/847). We (OME and the IDR project)
agree that a combination of dedicated, structured repositories and homes
for less structured data will be required for long-term solutions to data
deposition, preservation and access in imaging, and elsewhere.

Short responses to the other comments-- we have worked with Bob Murphy's
group (see http://www.openmicroscopy.org/site/products/partner) and others.
You are correct that the resource we now have and hope to continue to grow
would be a great way to test concepts in image similarity-- is it possible
to draw computational similarities between, for example, the microtubule
arrays of S. pombe and U2OS cells?

Funding-- yes, there are lots of opportunities. This is an open source
project, which we aim to continue to drive and grow. We welcome
collaboration and participation.

Thanks again-- we appreciate all comments and feedback.

Cheers,

Jason

On Wed, Jan 6, 2016 at 7:17 PM, jens rietdorf <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com/and include the link in your posting.
> *****
>
> Dear colleagues,
>
> with the recent explosion of data volumes through our advances in imaging
> automation, light sheet, large volumes, high content screening, volume EM,
> multibeam EM etcetc. I feel  really uncomfortable with the idea of shifting
> all these data to centralized depositories or to the cloud. I think we
> should rather think about decentralized storage and ways to remotely access
> data subsets. We have machines running that produce Gigabytes per second,
> and Terabytes per experiment. Wuth a total of 37TB as a test dataset we
> will not be able to address these challenges.
>
> just my 2 cents
> Abraços,
> Jens
>
> Dr. Jens Rietdorf, visiting scientist @ center for technological
> development in health (CDTS), Rio de Janeiro, Brasil.
>



--
**************************
Centre for Gene Regulation & Expression
School of Life Sciences
University of Dundee
Dundee  DD1 5EH
United Kingdom

phone (01382) 385819
Intl phone:  44 1382 385819
FAX  (01382) 388072
email: [hidden email]

Lab Page: http://www.lifesci.dundee.ac.uk/people/jason-swedlow
Open Microscopy Environment: http://openmicroscopy.org/
**************************