analyzing really big data sets

classic Classic list List threaded Threaded
34 messages Options
12
Brian Northan Brian Northan
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets *Commercial Response*

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Steffen

Why do I have to wait for all color channels/time points of a data set to
> be in RAM (or even worse: virtual memory) to start to
> display/deconvolve/analyze the data? That is a concept that doesn't cut it,
> and not for the first time in history. It seems the major advantage of this
> concept is that it is easier to implement for the software developers.


I've worked as a software engineer, developing both commercial and open
source software for Microscopy for 15 years now.   Often times software
engineers are working under tight deadlines on under-resourced projects.
If things are not implemented "optimally" it is usually a funding/marketing
decision, not the choice of the individual developers.

I would recommended getting a demo and testing the heck out of any
commercial products, before purchasing, and If you are not satisfied with
it (and I agree, some are not implemented optimally) I would voice your
concerns before purchasing... in my experience this is the quickest way to
get a "speed improvement" feature request sent to the developers.

Also it would be great if things like ImageJ/Fiji continue to get funding.
These projects require maintenance, in order to be able to handle large
datasets optimally.  Many of the core components of ImageJ are licensed
"permissively", meaning they could also potentially be integrated into
commercial products.  One of the problems in microscopy software is that
each company often has to re-invent the wheel on their own (file formats,
memory handling, etc.), moving towards common core components (again
licensed "permissively" so they can be integrated into commercial products)
would help solve a lot of these problems.

Brian






On Tue, Aug 2, 2016 at 7:47 AM, Steffen Dietzel <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Am 01.08.2016 um 14:28 schrieb Remko Dijkstra:
>
>>
>> To point out some numbers when it comes to data transfer rates:
>> SSD's typically have a reading speed of ~500 MB/sec. To read a 1 TB
>> dataset from such an SSD, this takes roughly 2000 seconds (=33 minutes!) to
>> load it in RAM and make it available for processing.
>>
>
> Which is exactly why it should not be necessary to load the complete data
> set into RAM to start working on it. Thank you for making this point.
>
> Why do I have to wait for all color channels/time points of a data set to
> be in RAM (or even worse: virtual memory) to start to
> display/deconvolve/analyze the data? That is a concept that doesn't cut it,
> and not for the first time in history. It seems the major advantage of this
> concept is that it is easier to implement for the software developers.
>
> When I started with confocal microscopy in the 90ies, a 40 MByte 3D-3
> Color data set was huge and we ran into similar problems. My personal
> impression (based on nothing but gut feeling) is that since then, RAM
> generally grew faster than our data size, so eventually the problem went
> away. Until a few years ago, when fast high resolution techniques came
> around (resonant scanners, 4 Mpixel cameras, light sheet..).  Now we are
> back to square one, sort of. This time I do not expect the problem to go
> away by itself since it appears that for most applications the 4-16 Gbyte
> of RAM of a standard computer is more than enough, no need to put in more.
>
> Steffen
>
> --
> ------------------------------------------------------------
> Steffen Dietzel, PD Dr. rer. nat
> Ludwig-Maximilians-Universität München
> Biomedical Center (BMC)
> Head of the Core Facility Bioimaging
>
> Großhaderner Straße 9
> D-82152 Planegg-Martinsried
> Germany
>
> http://www.bioimaging.bmc.med.uni-muenchen.de
>
Arvonn Tully Arvonn Tully
Reply | Threaded
Open this post in threaded view
|

*commercial response* Re: analyzing really big data sets *Commercial Response*

In reply to this post by Steffen Dietzel
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

*Commercial Response*
Dear Steffen,
Speaking as Imaris technical support, I would like to try to help point out
some facts and clear some misconceptions, and explain some choices
regarding Imaris and processing of large data sets.

Perhaps unwittingly, You are correct in that there is no reason for a
program to load the entire image into memory work on it, and Imaris does
not do that. Imaris does not load the entire data set into Ram to start
processing. As Andrew Dennis pointed out, it took Imaris 60 second to
"open" a 1.2 tb image. Imaris does not open the entire image, it is
impossible to do this in that time period. The Imaris file format (
http://open.bitplane.com/Default.aspx?tabid=268) is based off of HDF 5 (
https://www.hdfgroup.org) which is a multi-resolution format. Effectively,
Imaris figures out the current screen size, and then loads the portion and
resolution of the image which will fit on screen. This is nothing new, in
fact Imaris started doing this 10 years ago (okay, dec 2006)  in Imaris
v5.5. Initially this was limited to visualization, but over time,most
functions now multi-resolution aware.  Imaris processes your image block by
block on the appropriate resolution level based on your segmentation
parameters.  I recently re-installed Imaris v5.5 on a system to do a little
benchmark, and it took about 4 hours to count all of the nuclei in a 50 gb
image, and that 10 year old version of Imaris had to load the entire image
to do that. Imaris v8.3, the current release did the same calculation in 2
minutes - and it used considerably less memory to do it.

Regarding memory usage - Yes, Imaris uses lots of ram, and that is a good
thing.
Imaris does cache as much of the image into memory as possible to speed up
calculations, as most of the calculations are I/O limited. The main CPU
memory is often in the 10+ gigabytes per second range, PCIe SSD are ~2
gigabytes per second, SSD are at most ~500mb/s, and most conventional hard
drives are less than 200 mb / s. For many images, at some point the entire
image must be read by the cpu, if you can pre-load as much into memory as
possible - this will speed calculation time as the main cpu will be able to
process the image at 10 gb/s vs 200 mb/s..50 x faster!

It is still true that if you are making a change to an image, such as
masking, or changing the original data, Imaris will load make a temporary
copy of the file, this is due to keeping an "undo" version of the image,
this can be configured. For users of very large data, we tend to recommend
disabling the "undo" feature, to reduce memory usage. Imaris will still use
virtual memory if main ram is unavailable. Certainly this is something that
can be improved upon.


Regarding the comment about 4-16 GBytes of ram being enough, that is not
currently a practical solution for large data analysis. That may be the
case in the future, when all conventional hard drives have been eliiminated
and we are using a faster storage, e.g. everyone uses a PCIe SSD type
storage which is able to handle 2-5 GBytes/second of processing, once that
is the case, then I agree that excessive ram will be unnecessary, because
main storage will effectively be the same as main memory, and the data can
then be accessed directly by the CPU.
However, that is not today, and it will not be the case for a few more
years (a separate and very interesting discussion), so until that changes -
we will stick with the large memory recommendation for best possible
processing.  Even with high speed networking (10 gbps ... not 10 GB/s)  the
data transfer rate is >10 times slower than accessing in RAM.

The points about interfacing cluster computing systems and  time series
before acquisition are finish are still valid point -  these are area which
I hope that Bitplane will be able address in the future  with Imaris or a
clustered version of Imaris in the future.

Certainly there is more work which can be done to improve processing time,
and handling sparse data (such as time series which are still being
acquired).

Finally, I would like to echo what Brian Northan wrote- please test any
commercial
 software before buying.  With that in mind, I encourage everyone who will
be at the expansion Microscopy workshop (
https://www.janelia.org/you-janelia/conferences/expansion-microscopy-workshop
) at Janelia Farms to comment back on this topic. Matthew Gastinger. from
Bitplane will be there; additionally I believe representatives from Arrivis
and Amira teams will also be participating. It will be an excellent
opportunity to see first hand how the different software is able to process
terabyte sized images, and how quickly that processing occurs.

That being said, I hope I have been able to help clear up some points.

Best Regards,
Arvonn Tully
Imaris questions : [hidden email]
Other questions: [hidden email]


On Tue, Aug 2, 2016 at 11:23 AM Steffen Dietzel <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Am 01.08.2016 um 14:28 schrieb Remko Dijkstra:
> >
> > To point out some numbers when it comes to data transfer rates:
> > SSD's typically have a reading speed of ~500 MB/sec. To read a 1 TB
> > dataset from such an SSD, this takes roughly 2000 seconds (=33
> > minutes!) to load it in RAM and make it available for processing.
>
> Which is exactly why it should not be necessary to load the complete
> data set into RAM to start working on it. Thank you for making this point.
>
> Why do I have to wait for all color channels/time points of a data set
> to be in RAM (or even worse: virtual memory) to start to
> display/deconvolve/analyze the data? That is a concept that doesn't cut
> it, and not for the first time in history. It seems the major advantage
> of this concept is that it is easier to implement for the software
> developers.
>
> When I started with confocal microscopy in the 90ies, a 40 MByte 3D-3
> Color data set was huge and we ran into similar problems. My personal
> impression (based on nothing but gut feeling) is that since then, RAM
> generally grew faster than our data size, so eventually the problem went
> away. Until a few years ago, when fast high resolution techniques came
> around (resonant scanners, 4 Mpixel cameras, light sheet..).  Now we are
> back to square one, sort of. This time I do not expect the problem to go
> away by itself since it appears that for most applications the 4-16
> Gbyte of RAM of a standard computer is more than enough, no need to put
> in more.
>
> Steffen
>
> --
> ------------------------------------------------------------
> Steffen Dietzel, PD Dr. rer. nat
> Ludwig-Maximilians-Universität München
> Biomedical Center (BMC)
> Head of the Core Facility Bioimaging
>
> Großhaderner Straße 9
> D-82152 Planegg-Martinsried
> Germany
>
> http://www.bioimaging.bmc.med.uni-muenchen.de
>
0000001ed7f52e4a-dmarc-request 0000001ed7f52e4a-dmarc-request
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

In reply to this post by DENNIS Andrew
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear Dennis,

I still don't get it, Remko wrote that it takes at least 500s based on the SSD transfer rate to load the 1 TB data set and more typically 30 mins. How could you load it in 60s? Please share the details.

I will have a look at the file converter, but anyhow you have to first load the file to convert it, so it does not change much, maybe this runs as batch process. We are still trapped in this endless cycle of saving files, transferring them to storage, transferring them again to another machine (with fast SSDs) , potentially converting them to another format! Then loading them and analysing... Pretty mad.
I read today that the next generation of video games renders whole exploding cities (pretty mad too) in the cloud and transfers the data to gaming consoles or mobile phones, why can we not do this? Ok, it is the opposite problem, they have the vector data and want to get pixels out, whereas we have the pixels and want to get objects.   but it should be easier to cater for a few science labs than for the world gaming community.

Best wishes

Andreas

-----Original Message-----
From: "DENNIS Andrew" <[hidden email]>
Sent: ‎02/‎08/‎2016 09:35
To: "[hidden email]" <[hidden email]>
Subject: Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Andreas,

I was working with an IMS file, so it took about 60 seconds to load the 1.2TB file and its ready for analysis straight away, I did some spots analysis and tracking.  

I think you are using an non-native format, which is loading and previewing the data and analysis isn’t possible until the full file conversion is complete.

I’d very much recommend converting the file to IMS format using the file converter and then loading into Imaris, it’s definitely a better experience. The file converter doesn't need an Imaris licence and can be run on a separate PC if you want.

You mentioned the image Database (‘Arena’), we’ve got plenty of feedback from big data users, and as a result Arena was made an optional part of the Imaris install, so you can now run Imaris without Arena.

I hope this is helpful,

Andrew

-----Original Message-----
From: Confocal Microscopy List [mailto:[hidden email]] On Behalf Of Andreas Bruckbauer
Sent: 30 July 2016 08:40
To: [hidden email]
Subject: Re: analyzing really big data sets

EXTERNAL EMAIL

ATTACHMENT ADVISORY

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear Andrew,

Are you sure you loaded the whole data set in 60s? My experience with Imaris is that it quickly displays a part of the data set but when you want to do any meaningful analysis (like tracking cells) it really tries to load the full dataset into memory. To analyse data sets of 10-20 GB we need a workstation with 128 GB RAM while Arivis works with very little RAM.  As I understand we are here talking about 100GB - 5TB, so loading the full dataset is wholly unpractical. Maybe something changed in recent versions of Imaris? I stopped updating since Imaris introduced this ridiculous database which fills up the local hard disk. What about using Omero instead?

Best wishes

Andreas

-----Original Message-----
From: "DENNIS Andrew" <[hidden email]>
Sent: ‎29/‎07/‎2016 23:39
To: "[hidden email]" <[hidden email]>
Subject: Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Sorry Typo in an embarrassing part of my last message,

It should have said " today I loaded a 1.2TB data set, it took about 60 seconds. "

-----Original Message-----
From: DENNIS Andrew
Sent: 29 July 2016 23:17
To: Confocal Microscopy List <[hidden email]>
Subject: RE: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Esteban,

I work at Andor/Bitpane so you may consider this to be a commercial response..

I'm interested in your comment on Imaris, today I loaded a 1.2GB data set, it took about 60 seconds. When you refer to Big data, what sizes are you talking about?

Andrew


________________________________________
From: Confocal Microscopy List [[hidden email]] on behalf of G. Esteban Fernandez [[hidden email]]
Sent: 29 July 2016 20:43
To: [hidden email]
Subject: Re: analyzing really big data sets

EXTERNAL EMAIL

ATTACHMENT ADVISORY

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

I just wanted to echo the praises for Arivis and MicroVolution.

My favorite for 3D work is Imaris but it can't handle big data; when possible I downsample large datasets and work in Imaris. Amira can handle big data but in my experience it crashed more often than Arivis plus I prefer the user interface in Arivis.

MicroVolution results were comparable to Huygens and AutoQuant in my hands (qualitatively, I didn't do rigorous quantitative comparisons) in about
1/60 of the time with a lower end GPU. I mostly looked at confocal point-scanning data and didn't try truly big data. MicroVolution is limited to datasets <RAM, so you subvolume yourself before deconvolving.

-Esteban

On Jul 27, 2016 10:03 AM, "Andreas Bruckbauer" < [hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Would it not be much better to perform the data analysis on a scalable
> cluster which has fast connection to the storage instead of moving
> data around? We need to push software companies to make their
> solutions run on these machines. Instead of buying ever bigger
> analysis workstations which are obsolete after a few years, one would
> just buy computing time. The cluster can be shared with bioinformatics groups.
>
> My take on storage is that you need to have a cheap archive, otherwise
> there will be a point at which you run out of money to keep the ever
> expanding storage.
>
> Best wishes
>
> Andreas
>
> -----Original Message-----
> From: "Douglas Richardson" <[hidden email]>
> Sent: ‎27/‎07/‎2016 15:34
> To: "[hidden email]"
> <[hidden email]>
> Subject: Re: analyzing really big data sets
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> I'll echo Paul's endorsement of Arivis for 3D data sets and George's
> suggestion regarding Visiopharm for 2D data sets (I really love that
> it doesn't duplicate the data into yet another proprietary file type).
>
>
> However, theses are both expensive and there are open source options
> as well.  One of our groups has a great open-source work flow for
> imaging and registering cleared brains (imaged & registered >80
> cleared brains, ~150TB of data). Here is the reference:
>
> http://hcbi.fas.harvard.edu/publications/dopamine-neurons-projecting-p
> osterior-striatum-form-ananatomically-distinct
> .
> The Tessier-Lavigne lab just released a computational method (ClearMap
> http://www.sciencedirect.com/science/article/pii/S0092867416305554 for
> a similar process, as has the Ueda group with their CUBIC method (
> http://www.nature.com/nprot/journal/v10/n11/full/nprot.2015.085.html),
> although these both mainly deal with ultra-microscope data which isn't
> as intensive as other forms of lightsheet.
>
> Big data viewer in Fiji and Vaa3D are also good open source options
> for viewing the data.
>
> On the data storage side, the above mentioned publication was done
> mainly with a filing cabinet full of 2TB USB 3.0 external hard drives.
> Since then, we've run 10Gbit optical fiber to all of our microscopes
> and workstations.  Most importantly, this 10Gbit connection goes right
> through to our expandable storage server downtown.
>
> I think the two big lessons we've learned are the following:
>
> 1) Make sure your storage is expandable, you'll never have enough.
> We're currently at 250TB in a LUSTER configuration with plans to push
> into PTs soon.
> 2) You will always need to move data, make sure your connections are fast.
> We have a 3 tier system: 1) Microscope acquisition computer > 2)
> Processing workstations > 3) Long-term storage server.  Connections to
> the cloud are not fast enough, so I don't feel this is an option.
>
> Finally, many versions of commercial microscope acquisition software
> are unable to directly save data to network storage (or external
> drives) no matter how fast the connection. This is a feature we need
> to push the manufacturers for or else you'll always be limited to the
> storage space on your acquisition computer.
>
> -Doug
>
> On Wed, Jul 27, 2016 at 9:33 AM, Paul Paroutis
> <[hidden email]>
> wrote:
>
> > *****
> > To join, leave or search the confocal microscopy listserv, go to:
> > http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> > Post images on http://www.imgur.com and include the link in your
> posting.
> > *****
> >
> > We have been facing the same issue since the purchase of our Zeiss
> > Lightsheet system. On the commercial side of things, Arivis has
> > worked
> well
> > for us and I would recommend giving that a shot. On the
> > deconvolution
> side,
> > we recently purchased the Huygens deconvolution module and it has
> > given
> us
> > nice results. We had also tested the Microvolution software and were
> really
> > impressed at the speed and quality of deconvolution - the price tag
> > put
> it
> > out of our range for the time being, but it's definitely worth exploring.
> >
>


+++Scanned for Viruses by ForcePoint+++


___________________________________________________________________________This e-mail is confidential and is for the addressee only.  Please refer to www.oxinst.com/email-statement for regulatory information.
0000001ed7f52e4a-dmarc-request 0000001ed7f52e4a-dmarc-request
Reply | Threaded
Open this post in threaded view
|

Re: *commercial response* Re: analyzing really big data sets *Commercial Response*

In reply to this post by Arvonn Tully
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear Arvonn,
I was writing about time series and tracking objects and in this case the versions of Imaris I used did load the whole dataset into memory. If I don't play the time series at least once, some time points are even missing!

Best wishes

Andreas

-----Original Message-----
From: "Arvonn" <[hidden email]>
Sent: ‎02/‎08/‎2016 18:06
To: "[hidden email]" <[hidden email]>
Subject: *commercial response* Re: analyzing really big data sets *Commercial Response*

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

*Commercial Response*
Dear Steffen,
Speaking as Imaris technical support, I would like to try to help point out
some facts and clear some misconceptions, and explain some choices
regarding Imaris and processing of large data sets.

Perhaps unwittingly, You are correct in that there is no reason for a
program to load the entire image into memory work on it, and Imaris does
not do that. Imaris does not load the entire data set into Ram to start
processing. As Andrew Dennis pointed out, it took Imaris 60 second to
"open" a 1.2 tb image. Imaris does not open the entire image, it is
impossible to do this in that time period. The Imaris file format (
http://open.bitplane.com/Default.aspx?tabid=268) is based off of HDF 5 (
https://www.hdfgroup.org) which is a multi-resolution format. Effectively,
Imaris figures out the current screen size, and then loads the portion and
resolution of the image which will fit on screen. This is nothing new, in
fact Imaris started doing this 10 years ago (okay, dec 2006)  in Imaris
v5.5. Initially this was limited to visualization, but over time,most
functions now multi-resolution aware.  Imaris processes your image block by
block on the appropriate resolution level based on your segmentation
parameters.  I recently re-installed Imaris v5.5 on a system to do a little
benchmark, and it took about 4 hours to count all of the nuclei in a 50 gb
image, and that 10 year old version of Imaris had to load the entire image
to do that. Imaris v8.3, the current release did the same calculation in 2
minutes - and it used considerably less memory to do it.

Regarding memory usage - Yes, Imaris uses lots of ram, and that is a good
thing.
Imaris does cache as much of the image into memory as possible to speed up
calculations, as most of the calculations are I/O limited. The main CPU
memory is often in the 10+ gigabytes per second range, PCIe SSD are ~2
gigabytes per second, SSD are at most ~500mb/s, and most conventional hard
drives are less than 200 mb / s. For many images, at some point the entire
image must be read by the cpu, if you can pre-load as much into memory as
possible - this will speed calculation time as the main cpu will be able to
process the image at 10 gb/s vs 200 mb/s..50 x faster!

It is still true that if you are making a change to an image, such as
masking, or changing the original data, Imaris will load make a temporary
copy of the file, this is due to keeping an "undo" version of the image,
this can be configured. For users of very large data, we tend to recommend
disabling the "undo" feature, to reduce memory usage. Imaris will still use
virtual memory if main ram is unavailable. Certainly this is something that
can be improved upon.


Regarding the comment about 4-16 GBytes of ram being enough, that is not
currently a practical solution for large data analysis. That may be the
case in the future, when all conventional hard drives have been eliiminated
and we are using a faster storage, e.g. everyone uses a PCIe SSD type
storage which is able to handle 2-5 GBytes/second of processing, once that
is the case, then I agree that excessive ram will be unnecessary, because
main storage will effectively be the same as main memory, and the data can
then be accessed directly by the CPU.
However, that is not today, and it will not be the case for a few more
years (a separate and very interesting discussion), so until that changes -
we will stick with the large memory recommendation for best possible
processing.  Even with high speed networking (10 gbps ... not 10 GB/s)  the
data transfer rate is >10 times slower than accessing in RAM.

The points about interfacing cluster computing systems and  time series
before acquisition are finish are still valid point -  these are area which
I hope that Bitplane will be able address in the future  with Imaris or a
clustered version of Imaris in the future.

Certainly there is more work which can be done to improve processing time,
and handling sparse data (such as time series which are still being
acquired).

Finally, I would like to echo what Brian Northan wrote- please test any
commercial
 software before buying.  With that in mind, I encourage everyone who will
be at the expansion Microscopy workshop (
https://www.janelia.org/you-janelia/conferences/expansion-microscopy-workshop
) at Janelia Farms to comment back on this topic. Matthew Gastinger. from
Bitplane will be there; additionally I believe representatives from Arrivis
and Amira teams will also be participating. It will be an excellent
opportunity to see first hand how the different software is able to process
terabyte sized images, and how quickly that processing occurs.

That being said, I hope I have been able to help clear up some points.

Best Regards,
Arvonn Tully
Imaris questions : [hidden email]
Other questions: [hidden email]


On Tue, Aug 2, 2016 at 11:23 AM Steffen Dietzel <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Am 01.08.2016 um 14:28 schrieb Remko Dijkstra:
> >
> > To point out some numbers when it comes to data transfer rates:
> > SSD's typically have a reading speed of ~500 MB/sec. To read a 1 TB
> > dataset from such an SSD, this takes roughly 2000 seconds (=33
> > minutes!) to load it in RAM and make it available for processing.
>
> Which is exactly why it should not be necessary to load the complete
> data set into RAM to start working on it. Thank you for making this point.
>
> Why do I have to wait for all color channels/time points of a data set
> to be in RAM (or even worse: virtual memory) to start to
> display/deconvolve/analyze the data? That is a concept that doesn't cut
> it, and not for the first time in history. It seems the major advantage
> of this concept is that it is easier to implement for the software
> developers.
>
> When I started with confocal microscopy in the 90ies, a 40 MByte 3D-3
> Color data set was huge and we ran into similar problems. My personal
> impression (based on nothing but gut feeling) is that since then, RAM
> generally grew faster than our data size, so eventually the problem went
> away. Until a few years ago, when fast high resolution techniques came
> around (resonant scanners, 4 Mpixel cameras, light sheet..).  Now we are
> back to square one, sort of. This time I do not expect the problem to go
> away by itself since it appears that for most applications the 4-16
> Gbyte of RAM of a standard computer is more than enough, no need to put
> in more.
>
> Steffen
>
> --
> ------------------------------------------------------------
> Steffen Dietzel, PD Dr. rer. nat
> Ludwig-Maximilians-Universität München
> Biomedical Center (BMC)
> Head of the Core Facility Bioimaging
>
> Großhaderner Straße 9
> D-82152 Planegg-Martinsried
> Germany
>
> http://www.bioimaging.bmc.med.uni-muenchen.de
>
Mario Emmenlauer Mario Emmenlauer
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

In reply to this post by 0000001ed7f52e4a-dmarc-request
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi,

On 02.08.2016 19:25, Andreas Bruckbauer wrote:
>
> Dear Dennis,
>
> I still don't get it, Remko wrote that it takes at least 500s based on the SSD transfer rate to load the 1 TB data set and more typically 30 mins. How could you load it in 60s? Please share the details.

There is no real magic here :) The "trick" is that the file contains
multiple downscaled versions of the dataset, a so-called resolution
pyramid. Several modern file formats now have this feature, even though
I think Imaris was one of the early adopters.

In any case, this feature is mostly helpful for fast opening and viewing
of the dataset. Several file formats combine this with a feature for
chunking, to load only certain "blocks" of the higher resolutions into
memory, as needed. This way, when you zoom in to any detail, only those
chunks that encompass the close-up region are loaded in high resolution,
which saves memory and loading time.

For the processing, it depends on the algorithm whether chunking can be
used or not. Certainly not all software, but also not all algorithms can
profit from chunking, because a transformation may require the full
dataset. If your image analysis software does supports chunking, and
the algorithm can work block-wise, then it can run with very low memory
requirements. In good cases you may be able to process virtually any
size datasets with just enough RAM for a few blocks, like a few MB of
RAM! Of course, eventually the full file needs to be loaded at a high
resolution to process it at this resolution.

To combine the features "resolution pyramid" and "chunking", lets say
your image resolution is much higher than the level of detail you want
to process. In this case, the image processing software may be clever
enough to load the lowest resolution that is sufficient to perform the
task! I.e. if you want to detect nuclei that are many times the size
of your image resolution, it might be sufficient to already load the 2x
or 4x or maybe even 8x lower-resolution version of the image from the
resolution pyramid, and perform the detection on there. In combination
with chunking, this will be several times faster and more memory
efficient.

But whether your software will do that for a certain task, depends on
the software, the file format, and last not least on the processing
you want to perform! So you would need to check this very precisely
for your application.

Cheers,

   Mario



> I will have a look at the file converter, but anyhow you have to first load the file to convert it, so it does not change much, maybe this runs as batch process. We are still trapped in this endless cycle of saving files, transferring them to storage, transferring them again to another machine (with fast SSDs) , potentially converting them to another format! Then loading them and analysing... Pretty mad.
> I read today that the next generation of video games renders whole exploding cities (pretty mad too) in the cloud and transfers the data to gaming consoles or mobile phones, why can we not do this? Ok, it is the opposite problem, they have the vector data and want to get pixels out, whereas we have the pixels and want to get objects.   but it should be easier to cater for a few science labs than for the world gaming community.
>
> Best wishes
>
> Andreas
>
> -----Original Message-----
> From: "DENNIS Andrew" <[hidden email]>
> Sent: ‎02/‎08/‎2016 09:35
> To: "[hidden email]" <[hidden email]>
> Subject: Re: analyzing really big data sets
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hi Andreas,
>
> I was working with an IMS file, so it took about 60 seconds to load the 1.2TB file and its ready for analysis straight away, I did some spots analysis and tracking.  
>
> I think you are using an non-native format, which is loading and previewing the data and analysis isn’t possible until the full file conversion is complete.
>
> I’d very much recommend converting the file to IMS format using the file converter and then loading into Imaris, it’s definitely a better experience. The file converter doesn't need an Imaris licence and can be run on a separate PC if you want.
>
> You mentioned the image Database (‘Arena’), we’ve got plenty of feedback from big data users, and as a result Arena was made an optional part of the Imaris install, so you can now run Imaris without Arena.
>
> I hope this is helpful,
>
> Andrew
>
> -----Original Message-----
> From: Confocal Microscopy List [mailto:[hidden email]] On Behalf Of Andreas Bruckbauer
> Sent: 30 July 2016 08:40
> To: [hidden email]
> Subject: Re: analyzing really big data sets
>
> EXTERNAL EMAIL
>
> ATTACHMENT ADVISORY
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Dear Andrew,
>
> Are you sure you loaded the whole data set in 60s? My experience with Imaris is that it quickly displays a part of the data set but when you want to do any meaningful analysis (like tracking cells) it really tries to load the full dataset into memory. To analyse data sets of 10-20 GB we need a workstation with 128 GB RAM while Arivis works with very little RAM.  As I understand we are here talking about 100GB - 5TB, so loading the full dataset is wholly unpractical. Maybe something changed in recent versions of Imaris? I stopped updating since Imaris introduced this ridiculous database which fills up the local hard disk. What about using Omero instead?
>
> Best wishes
>
> Andreas
>
> -----Original Message-----
> From: "DENNIS Andrew" <[hidden email]>
> Sent: ‎29/‎07/‎2016 23:39
> To: "[hidden email]" <[hidden email]>
> Subject: Re: analyzing really big data sets
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Sorry Typo in an embarrassing part of my last message,
>
> It should have said " today I loaded a 1.2TB data set, it took about 60 seconds. "
>
> -----Original Message-----
> From: DENNIS Andrew
> Sent: 29 July 2016 23:17
> To: Confocal Microscopy List <[hidden email]>
> Subject: RE: analyzing really big data sets
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hi Esteban,
>
> I work at Andor/Bitpane so you may consider this to be a commercial response..
>
> I'm interested in your comment on Imaris, today I loaded a 1.2GB data set, it took about 60 seconds. When you refer to Big data, what sizes are you talking about?
>
> Andrew
>
>
> ________________________________________
> From: Confocal Microscopy List [[hidden email]] on behalf of G. Esteban Fernandez [[hidden email]]
> Sent: 29 July 2016 20:43
> To: [hidden email]
> Subject: Re: analyzing really big data sets
>
> EXTERNAL EMAIL
>
> ATTACHMENT ADVISORY
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> I just wanted to echo the praises for Arivis and MicroVolution.
>
> My favorite for 3D work is Imaris but it can't handle big data; when possible I downsample large datasets and work in Imaris. Amira can handle big data but in my experience it crashed more often than Arivis plus I prefer the user interface in Arivis.
>
> MicroVolution results were comparable to Huygens and AutoQuant in my hands (qualitatively, I didn't do rigorous quantitative comparisons) in about
> 1/60 of the time with a lower end GPU. I mostly looked at confocal point-scanning data and didn't try truly big data. MicroVolution is limited to datasets <RAM, so you subvolume yourself before deconvolving.
>
> -Esteban
>
> On Jul 27, 2016 10:03 AM, "Andreas Bruckbauer" < [hidden email]> wrote:
>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Would it not be much better to perform the data analysis on a scalable
>> cluster which has fast connection to the storage instead of moving
>> data around? We need to push software companies to make their
>> solutions run on these machines. Instead of buying ever bigger
>> analysis workstations which are obsolete after a few years, one would
>> just buy computing time. The cluster can be shared with bioinformatics groups.
>>
>> My take on storage is that you need to have a cheap archive, otherwise
>> there will be a point at which you run out of money to keep the ever
>> expanding storage.
>>
>> Best wishes
>>
>> Andreas
>>
>> -----Original Message-----
>> From: "Douglas Richardson" <[hidden email]>
>> Sent: ‎27/‎07/‎2016 15:34
>> To: "[hidden email]"
>> <[hidden email]>
>> Subject: Re: analyzing really big data sets
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> I'll echo Paul's endorsement of Arivis for 3D data sets and George's
>> suggestion regarding Visiopharm for 2D data sets (I really love that
>> it doesn't duplicate the data into yet another proprietary file type).
>>
>>
>> However, theses are both expensive and there are open source options
>> as well.  One of our groups has a great open-source work flow for
>> imaging and registering cleared brains (imaged & registered >80
>> cleared brains, ~150TB of data). Here is the reference:
>>
>> http://hcbi.fas.harvard.edu/publications/dopamine-neurons-projecting-p
>> osterior-striatum-form-ananatomically-distinct
>> .
>> The Tessier-Lavigne lab just released a computational method (ClearMap
>> http://www.sciencedirect.com/science/article/pii/S0092867416305554 for
>> a similar process, as has the Ueda group with their CUBIC method (
>> http://www.nature.com/nprot/journal/v10/n11/full/nprot.2015.085.html),
>> although these both mainly deal with ultra-microscope data which isn't
>> as intensive as other forms of lightsheet.
>>
>> Big data viewer in Fiji and Vaa3D are also good open source options
>> for viewing the data.
>>
>> On the data storage side, the above mentioned publication was done
>> mainly with a filing cabinet full of 2TB USB 3.0 external hard drives.
>> Since then, we've run 10Gbit optical fiber to all of our microscopes
>> and workstations.  Most importantly, this 10Gbit connection goes right
>> through to our expandable storage server downtown.
>>
>> I think the two big lessons we've learned are the following:
>>
>> 1) Make sure your storage is expandable, you'll never have enough.
>> We're currently at 250TB in a LUSTER configuration with plans to push
>> into PTs soon.
>> 2) You will always need to move data, make sure your connections are fast.
>> We have a 3 tier system: 1) Microscope acquisition computer > 2)
>> Processing workstations > 3) Long-term storage server.  Connections to
>> the cloud are not fast enough, so I don't feel this is an option.
>>
>> Finally, many versions of commercial microscope acquisition software
>> are unable to directly save data to network storage (or external
>> drives) no matter how fast the connection. This is a feature we need
>> to push the manufacturers for or else you'll always be limited to the
>> storage space on your acquisition computer.
>>
>> -Doug
>>
>> On Wed, Jul 27, 2016 at 9:33 AM, Paul Paroutis
>> <[hidden email]>
>> wrote:
>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your
>> posting.
>>> *****
>>>
>>> We have been facing the same issue since the purchase of our Zeiss
>>> Lightsheet system. On the commercial side of things, Arivis has
>>> worked
>> well
>>> for us and I would recommend giving that a shot. On the
>>> deconvolution
>> side,
>>> we recently purchased the Huygens deconvolution module and it has
>>> given
>> us
>>> nice results. We had also tested the Microvolution software and were
>> really
>>> impressed at the speed and quality of deconvolution - the price tag
>>> put
>> it
>>> out of our range for the time being, but it's definitely worth exploring.
>>>
>>
>
>
> +++Scanned for Viruses by ForcePoint+++
>
>
> ___________________________________________________________________________This e-mail is confidential and is for the addressee only.  Please refer to www.oxinst.com/email-statement for regulatory information.
>



Viele Gruesse,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/
John Oreopoulos John Oreopoulos
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

In all my travels around the world installing microscope systems, one facility stood out from the rest in terms of image computing power, residing at the University if Texas South Western in Dallas. The following is posted on behalf of Dr. Marcel Mettlen:

All high-end microscopes (TIRF-, long-term-live-cell-, and light sheet microscopes) are connected via 10Gb/s to our local computer cluster. Raw data is transferred using FileZilla. Regarding hardware, we’ve invested a lot of thoughts/money into the right infrastructure (termed BioHPC), which currently includes:
- A 4700 CPU HPC cluster
- 8 large memory GPU equipped nodes for massively parallel programs using the nVidia CUDA toolkit (= 19,968 GPU cores!)
- 24.8Tb RAM
- 3 types of integrated storage systems with different internal data transfer rates: (1) 240Tb @ 4GB/s, (2) 960Tb @ 6-8GB/s,  (3) 2500Tb @ 30-40GB/s
- The whole system is designed for easy scale-up and operated via Red Hat Enterprise Linux 6
- Connection to BioHPC is established using either Linux workstations, desktop clients or virtual machine images (for access via other operating systems). Alternatively, all resources are accessible via web-based graphical user interfaces.
 
More info under:https://portal.biohpc.swmed.edu/content/about/ andhttps://portal.biohpc.swmed.edu/content/about/systems/
 
For visualization of large (3D) data sets, we’ve tested several packages such as Amira. Imaris is another good software.
 
For analysis and processing of large data files, we use custom Matlab software. Whatever the language, one should take advantage of multicore parallelization and have computing nodes with significant RAM (currently, we use 256 Gb/node). An alternative is to rely on GPU.
 
 
Hope that helps,
 
Marcel
 
Assistant Professor and
Director of Research and Collaborations
Dept of Cell Biology, NL5.120GA
6000 Harry Hines Blvd
Dallas, TX 75235-9039

John Oreopoulos



> On Aug 2, 2016, at 4:45 PM, Mario Emmenlauer <[hidden email]> wrote:
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hi,
>
>> On 02.08.2016 19:25, Andreas Bruckbauer wrote:
>>
>> Dear Dennis,
>>
>> I still don't get it, Remko wrote that it takes at least 500s based on the SSD transfer rate to load the 1 TB data set and more typically 30 mins. How could you load it in 60s? Please share the details.
>
> There is no real magic here :) The "trick" is that the file contains
> multiple downscaled versions of the dataset, a so-called resolution
> pyramid. Several modern file formats now have this feature, even though
> I think Imaris was one of the early adopters.
>
> In any case, this feature is mostly helpful for fast opening and viewing
> of the dataset. Several file formats combine this with a feature for
> chunking, to load only certain "blocks" of the higher resolutions into
> memory, as needed. This way, when you zoom in to any detail, only those
> chunks that encompass the close-up region are loaded in high resolution,
> which saves memory and loading time.
>
> For the processing, it depends on the algorithm whether chunking can be
> used or not. Certainly not all software, but also not all algorithms can
> profit from chunking, because a transformation may require the full
> dataset. If your image analysis software does supports chunking, and
> the algorithm can work block-wise, then it can run with very low memory
> requirements. In good cases you may be able to process virtually any
> size datasets with just enough RAM for a few blocks, like a few MB of
> RAM! Of course, eventually the full file needs to be loaded at a high
> resolution to process it at this resolution.
>
> To combine the features "resolution pyramid" and "chunking", lets say
> your image resolution is much higher than the level of detail you want
> to process. In this case, the image processing software may be clever
> enough to load the lowest resolution that is sufficient to perform the
> task! I.e. if you want to detect nuclei that are many times the size
> of your image resolution, it might be sufficient to already load the 2x
> or 4x or maybe even 8x lower-resolution version of the image from the
> resolution pyramid, and perform the detection on there. In combination
> with chunking, this will be several times faster and more memory
> efficient.
>
> But whether your software will do that for a certain task, depends on
> the software, the file format, and last not least on the processing
> you want to perform! So you would need to check this very precisely
> for your application.
>
> Cheers,
>
>   Mario
>
>
>
>> I will have a look at the file converter, but anyhow you have to first load the file to convert it, so it does not change much, maybe this runs as batch process. We are still trapped in this endless cycle of saving files, transferring them to storage, transferring them again to another machine (with fast SSDs) , potentially converting them to another format! Then loading them and analysing... Pretty mad.
>> I read today that the next generation of video games renders whole exploding cities (pretty mad too) in the cloud and transfers the data to gaming consoles or mobile phones, why can we not do this? Ok, it is the opposite problem, they have the vector data and want to get pixels out, whereas we have the pixels and want to get objects.   but it should be easier to cater for a few science labs than for the world gaming community.
>>
>> Best wishes
>>
>> Andreas
>>
>> -----Original Message-----
>> From: "DENNIS Andrew" <[hidden email]>
>> Sent: ‎02/‎08/‎2016 09:35
>> To: "[hidden email]" <[hidden email]>
>> Subject: Re: analyzing really big data sets
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Hi Andreas,
>>
>> I was working with an IMS file, so it took about 60 seconds to load the 1.2TB file and its ready for analysis straight away, I did some spots analysis and tracking.  
>>
>> I think you are using an non-native format, which is loading and previewing the data and analysis isn’t possible until the full file conversion is complete.
>>
>> I’d very much recommend converting the file to IMS format using the file converter and then loading into Imaris, it’s definitely a better experience. The file converter doesn't need an Imaris licence and can be run on a separate PC if you want.
>>
>> You mentioned the image Database (‘Arena’), we’ve got plenty of feedback from big data users, and as a result Arena was made an optional part of the Imaris install, so you can now run Imaris without Arena.
>>
>> I hope this is helpful,
>>
>> Andrew
>>
>> -----Original Message-----
>> From: Confocal Microscopy List [mailto:[hidden email]] On Behalf Of Andreas Bruckbauer
>> Sent: 30 July 2016 08:40
>> To: [hidden email]
>> Subject: Re: analyzing really big data sets
>>
>> EXTERNAL EMAIL
>>
>> ATTACHMENT ADVISORY
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Dear Andrew,
>>
>> Are you sure you loaded the whole data set in 60s? My experience with Imaris is that it quickly displays a part of the data set but when you want to do any meaningful analysis (like tracking cells) it really tries to load the full dataset into memory. To analyse data sets of 10-20 GB we need a workstation with 128 GB RAM while Arivis works with very little RAM.  As I understand we are here talking about 100GB - 5TB, so loading the full dataset is wholly unpractical. Maybe something changed in recent versions of Imaris? I stopped updating since Imaris introduced this ridiculous database which fills up the local hard disk. What about using Omero instead?
>>
>> Best wishes
>>
>> Andreas
>>
>> -----Original Message-----
>> From: "DENNIS Andrew" <[hidden email]>
>> Sent: ‎29/‎07/‎2016 23:39
>> To: "[hidden email]" <[hidden email]>
>> Subject: Re: analyzing really big data sets
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Sorry Typo in an embarrassing part of my last message,
>>
>> It should have said " today I loaded a 1.2TB data set, it took about 60 seconds. "
>>
>> -----Original Message-----
>> From: DENNIS Andrew
>> Sent: 29 July 2016 23:17
>> To: Confocal Microscopy List <[hidden email]>
>> Subject: RE: analyzing really big data sets
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Hi Esteban,
>>
>> I work at Andor/Bitpane so you may consider this to be a commercial response..
>>
>> I'm interested in your comment on Imaris, today I loaded a 1.2GB data set, it took about 60 seconds. When you refer to Big data, what sizes are you talking about?
>>
>> Andrew
>>
>>
>> ________________________________________
>> From: Confocal Microscopy List [[hidden email]] on behalf of G. Esteban Fernandez [[hidden email]]
>> Sent: 29 July 2016 20:43
>> To: [hidden email]
>> Subject: Re: analyzing really big data sets
>>
>> EXTERNAL EMAIL
>>
>> ATTACHMENT ADVISORY
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> I just wanted to echo the praises for Arivis and MicroVolution.
>>
>> My favorite for 3D work is Imaris but it can't handle big data; when possible I downsample large datasets and work in Imaris. Amira can handle big data but in my experience it crashed more often than Arivis plus I prefer the user interface in Arivis.
>>
>> MicroVolution results were comparable to Huygens and AutoQuant in my hands (qualitatively, I didn't do rigorous quantitative comparisons) in about
>> 1/60 of the time with a lower end GPU. I mostly looked at confocal point-scanning data and didn't try truly big data. MicroVolution is limited to datasets <RAM, so you subvolume yourself before deconvolving.
>>
>> -Esteban
>>
>>> On Jul 27, 2016 10:03 AM, "Andreas Bruckbauer" < [hidden email]> wrote:
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> Would it not be much better to perform the data analysis on a scalable
>>> cluster which has fast connection to the storage instead of moving
>>> data around? We need to push software companies to make their
>>> solutions run on these machines. Instead of buying ever bigger
>>> analysis workstations which are obsolete after a few years, one would
>>> just buy computing time. The cluster can be shared with bioinformatics groups.
>>>
>>> My take on storage is that you need to have a cheap archive, otherwise
>>> there will be a point at which you run out of money to keep the ever
>>> expanding storage.
>>>
>>> Best wishes
>>>
>>> Andreas
>>>
>>> -----Original Message-----
>>> From: "Douglas Richardson" <[hidden email]>
>>> Sent: ‎27/‎07/‎2016 15:34
>>> To: "[hidden email]"
>>> <[hidden email]>
>>> Subject: Re: analyzing really big data sets
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> I'll echo Paul's endorsement of Arivis for 3D data sets and George's
>>> suggestion regarding Visiopharm for 2D data sets (I really love that
>>> it doesn't duplicate the data into yet another proprietary file type).
>>>
>>>
>>> However, theses are both expensive and there are open source options
>>> as well.  One of our groups has a great open-source work flow for
>>> imaging and registering cleared brains (imaged & registered >80
>>> cleared brains, ~150TB of data). Here is the reference:
>>>
>>> http://hcbi.fas.harvard.edu/publications/dopamine-neurons-projecting-p
>>> osterior-striatum-form-ananatomically-distinct
>>> .
>>> The Tessier-Lavigne lab just released a computational method (ClearMap
>>> http://www.sciencedirect.com/science/article/pii/S0092867416305554 for
>>> a similar process, as has the Ueda group with their CUBIC method (
>>> http://www.nature.com/nprot/journal/v10/n11/full/nprot.2015.085.html),
>>> although these both mainly deal with ultra-microscope data which isn't
>>> as intensive as other forms of lightsheet.
>>>
>>> Big data viewer in Fiji and Vaa3D are also good open source options
>>> for viewing the data.
>>>
>>> On the data storage side, the above mentioned publication was done
>>> mainly with a filing cabinet full of 2TB USB 3.0 external hard drives.
>>> Since then, we've run 10Gbit optical fiber to all of our microscopes
>>> and workstations.  Most importantly, this 10Gbit connection goes right
>>> through to our expandable storage server downtown.
>>>
>>> I think the two big lessons we've learned are the following:
>>>
>>> 1) Make sure your storage is expandable, you'll never have enough.
>>> We're currently at 250TB in a LUSTER configuration with plans to push
>>> into PTs soon.
>>> 2) You will always need to move data, make sure your connections are fast.
>>> We have a 3 tier system: 1) Microscope acquisition computer > 2)
>>> Processing workstations > 3) Long-term storage server.  Connections to
>>> the cloud are not fast enough, so I don't feel this is an option.
>>>
>>> Finally, many versions of commercial microscope acquisition software
>>> are unable to directly save data to network storage (or external
>>> drives) no matter how fast the connection. This is a feature we need
>>> to push the manufacturers for or else you'll always be limited to the
>>> storage space on your acquisition computer.
>>>
>>> -Doug
>>>
>>> On Wed, Jul 27, 2016 at 9:33 AM, Paul Paroutis
>>> <[hidden email]>
>>> wrote:
>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>>> Post images on http://www.imgur.com and include the link in your
>>> posting.
>>>> *****
>>>>
>>>> We have been facing the same issue since the purchase of our Zeiss
>>>> Lightsheet system. On the commercial side of things, Arivis has
>>>> worked
>>> well
>>>> for us and I would recommend giving that a shot. On the
>>>> deconvolution
>>> side,
>>>> we recently purchased the Huygens deconvolution module and it has
>>>> given
>>> us
>>>> nice results. We had also tested the Microvolution software and were
>>> really
>>>> impressed at the speed and quality of deconvolution - the price tag
>>>> put
>>> it
>>>> out of our range for the time being, but it's definitely worth exploring.
>>
>>
>> +++Scanned for Viruses by ForcePoint+++
>>
>>
>> ___________________________________________________________________________This e-mail is confidential and is for the addressee only.  Please refer to www.oxinst.com/email-statement for regulatory information.
>
>
>
> Viele Gruesse,
>
>    Mario Emmenlauer
>
>
> --
> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
> D-81669 München                          http://www.biodataanalysis.de/
Martin Wessendorf-2 Martin Wessendorf-2
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

John--

I realize that this isn't your system, but can you give us an idea of
what a system like this can actually do--and, if it were scaled up (as
it apparently can be), what it would be capable of?   The numbers sound
big but don't mean much to a guy who still thinks four Gbyte of RAM is
pretty good!

Thanks!

Martin Wessendorf




On 8/3/2016 2:27 AM, John Oreopoulos wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> In all my travels around the world installing microscope systems, one facility stood out from the rest in terms of image computing power, residing at the University if Texas South Western in Dallas. The following is posted on behalf of Dr. Marcel Mettlen:
>
> All high-end microscopes (TIRF-, long-term-live-cell-, and light sheet microscopes) are connected via 10Gb/s to our local computer cluster. Raw data is transferred using FileZilla. Regarding hardware, we’ve invested a lot of thoughts/money into the right infrastructure (termed BioHPC), which currently includes:
> - A 4700 CPU HPC cluster
> - 8 large memory GPU equipped nodes for massively parallel programs using the nVidia CUDA toolkit (= 19,968 GPU cores!)
> - 24.8Tb RAM
> - 3 types of integrated storage systems with different internal data transfer rates: (1) 240Tb @ 4GB/s, (2) 960Tb @ 6-8GB/s,  (3) 2500Tb @ 30-40GB/s
> - The whole system is designed for easy scale-up and operated via Red Hat Enterprise Linux 6
> - Connection to BioHPC is established using either Linux workstations, desktop clients or virtual machine images (for access via other operating systems). Alternatively, all resources are accessible via web-based graphical user interfaces.
>  
> More info under:https://portal.biohpc.swmed.edu/content/about/ andhttps://portal.biohpc.swmed.edu/content/about/systems/
>  
> For visualization of large (3D) data sets, we’ve tested several packages such as Amira. Imaris is another good software.
>  
> For analysis and processing of large data files, we use custom Matlab software. Whatever the language, one should take advantage of multicore parallelization and have computing nodes with significant RAM (currently, we use 256 Gb/node). An alternative is to rely on GPU.
>  
>  
> Hope that helps,
>  
> Marcel
>  
> Assistant Professor and
> Director of Research and Collaborations
> Dept of Cell Biology, NL5.120GA
> 6000 Harry Hines Blvd
> Dallas, TX 75235-9039
>
> John Oreopoulos
>
>
>
>> On Aug 2, 2016, at 4:45 PM, Mario Emmenlauer <[hidden email]> wrote:
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Hi,
>>
>>> On 02.08.2016 19:25, Andreas Bruckbauer wrote:
>>>
>>> Dear Dennis,
>>>
>>> I still don't get it, Remko wrote that it takes at least 500s based on the SSD transfer rate to load the 1 TB data set and more typically 30 mins. How could you load it in 60s? Please share the details.
>> There is no real magic here :) The "trick" is that the file contains
>> multiple downscaled versions of the dataset, a so-called resolution
>> pyramid. Several modern file formats now have this feature, even though
>> I think Imaris was one of the early adopters.
>>
>> In any case, this feature is mostly helpful for fast opening and viewing
>> of the dataset. Several file formats combine this with a feature for
>> chunking, to load only certain "blocks" of the higher resolutions into
>> memory, as needed. This way, when you zoom in to any detail, only those
>> chunks that encompass the close-up region are loaded in high resolution,
>> which saves memory and loading time.
>>
>> For the processing, it depends on the algorithm whether chunking can be
>> used or not. Certainly not all software, but also not all algorithms can
>> profit from chunking, because a transformation may require the full
>> dataset. If your image analysis software does supports chunking, and
>> the algorithm can work block-wise, then it can run with very low memory
>> requirements. In good cases you may be able to process virtually any
>> size datasets with just enough RAM for a few blocks, like a few MB of
>> RAM! Of course, eventually the full file needs to be loaded at a high
>> resolution to process it at this resolution.
>>
>> To combine the features "resolution pyramid" and "chunking", lets say
>> your image resolution is much higher than the level of detail you want
>> to process. In this case, the image processing software may be clever
>> enough to load the lowest resolution that is sufficient to perform the
>> task! I.e. if you want to detect nuclei that are many times the size
>> of your image resolution, it might be sufficient to already load the 2x
>> or 4x or maybe even 8x lower-resolution version of the image from the
>> resolution pyramid, and perform the detection on there. In combination
>> with chunking, this will be several times faster and more memory
>> efficient.
>>
>> But whether your software will do that for a certain task, depends on
>> the software, the file format, and last not least on the processing
>> you want to perform! So you would need to check this very precisely
>> for your application.
>>
>> Cheers,
>>
>>    Mario
>>
>>
>>
>>> I will have a look at the file converter, but anyhow you have to first load the file to convert it, so it does not change much, maybe this runs as batch process. We are still trapped in this endless cycle of saving files, transferring them to storage, transferring them again to another machine (with fast SSDs) , potentially converting them to another format! Then loading them and analysing... Pretty mad.
>>> I read today that the next generation of video games renders whole exploding cities (pretty mad too) in the cloud and transfers the data to gaming consoles or mobile phones, why can we not do this? Ok, it is the opposite problem, they have the vector data and want to get pixels out, whereas we have the pixels and want to get objects.   but it should be easier to cater for a few science labs than for the world gaming community.
>>>
>>> Best wishes
>>>
>>> Andreas
>>>
>>> -----Original Message-----
>>> From: "DENNIS Andrew" <[hidden email]>
>>> Sent: ‎02/‎08/‎2016 09:35
>>> To: "[hidden email]" <[hidden email]>
>>> Subject: Re: analyzing really big data sets
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> Hi Andreas,
>>>
>>> I was working with an IMS file, so it took about 60 seconds to load the 1.2TB file and its ready for analysis straight away, I did some spots analysis and tracking.
>>>
>>> I think you are using an non-native format, which is loading and previewing the data and analysis isn’t possible until the full file conversion is complete.
>>>
>>> I’d very much recommend converting the file to IMS format using the file converter and then loading into Imaris, it’s definitely a better experience. The file converter doesn't need an Imaris licence and can be run on a separate PC if you want.
>>>
>>> You mentioned the image Database (‘Arena’), we’ve got plenty of feedback from big data users, and as a result Arena was made an optional part of the Imaris install, so you can now run Imaris without Arena.
>>>
>>> I hope this is helpful,
>>>
>>> Andrew
>>>
>>> -----Original Message-----
>>> From: Confocal Microscopy List [mailto:[hidden email]] On Behalf Of Andreas Bruckbauer
>>> Sent: 30 July 2016 08:40
>>> To: [hidden email]
>>> Subject: Re: analyzing really big data sets
>>>
>>> EXTERNAL EMAIL
>>>
>>> ATTACHMENT ADVISORY
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> Dear Andrew,
>>>
>>> Are you sure you loaded the whole data set in 60s? My experience with Imaris is that it quickly displays a part of the data set but when you want to do any meaningful analysis (like tracking cells) it really tries to load the full dataset into memory. To analyse data sets of 10-20 GB we need a workstation with 128 GB RAM while Arivis works with very little RAM.  As I understand we are here talking about 100GB - 5TB, so loading the full dataset is wholly unpractical. Maybe something changed in recent versions of Imaris? I stopped updating since Imaris introduced this ridiculous database which fills up the local hard disk. What about using Omero instead?
>>>
>>> Best wishes
>>>
>>> Andreas
>>>
>>> -----Original Message-----
>>> From: "DENNIS Andrew" <[hidden email]>
>>> Sent: ‎29/‎07/‎2016 23:39
>>> To: "[hidden email]" <[hidden email]>
>>> Subject: Re: analyzing really big data sets
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> Sorry Typo in an embarrassing part of my last message,
>>>
>>> It should have said " today I loaded a 1.2TB data set, it took about 60 seconds. "
>>>
>>> -----Original Message-----
>>> From: DENNIS Andrew
>>> Sent: 29 July 2016 23:17
>>> To: Confocal Microscopy List <[hidden email]>
>>> Subject: RE: analyzing really big data sets
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> Hi Esteban,
>>>
>>> I work at Andor/Bitpane so you may consider this to be a commercial response..
>>>
>>> I'm interested in your comment on Imaris, today I loaded a 1.2GB data set, it took about 60 seconds. When you refer to Big data, what sizes are you talking about?
>>>
>>> Andrew
>>>
>>>
>>> ________________________________________
>>> From: Confocal Microscopy List [[hidden email]] on behalf of G. Esteban Fernandez [[hidden email]]
>>> Sent: 29 July 2016 20:43
>>> To: [hidden email]
>>> Subject: Re: analyzing really big data sets
>>>
>>> EXTERNAL EMAIL
>>>
>>> ATTACHMENT ADVISORY
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your posting.
>>> *****
>>>
>>> I just wanted to echo the praises for Arivis and MicroVolution.
>>>
>>> My favorite for 3D work is Imaris but it can't handle big data; when possible I downsample large datasets and work in Imaris. Amira can handle big data but in my experience it crashed more often than Arivis plus I prefer the user interface in Arivis.
>>>
>>> MicroVolution results were comparable to Huygens and AutoQuant in my hands (qualitatively, I didn't do rigorous quantitative comparisons) in about
>>> 1/60 of the time with a lower end GPU. I mostly looked at confocal point-scanning data and didn't try truly big data. MicroVolution is limited to datasets <RAM, so you subvolume yourself before deconvolving.
>>>
>>> -Esteban
>>>
>>>> On Jul 27, 2016 10:03 AM, "Andreas Bruckbauer" < [hidden email]> wrote:
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>>> Post images on http://www.imgur.com and include the link in your posting.
>>>> *****
>>>>
>>>> Would it not be much better to perform the data analysis on a scalable
>>>> cluster which has fast connection to the storage instead of moving
>>>> data around? We need to push software companies to make their
>>>> solutions run on these machines. Instead of buying ever bigger
>>>> analysis workstations which are obsolete after a few years, one would
>>>> just buy computing time. The cluster can be shared with bioinformatics groups.
>>>>
>>>> My take on storage is that you need to have a cheap archive, otherwise
>>>> there will be a point at which you run out of money to keep the ever
>>>> expanding storage.
>>>>
>>>> Best wishes
>>>>
>>>> Andreas
>>>>
>>>> -----Original Message-----
>>>> From: "Douglas Richardson" <[hidden email]>
>>>> Sent: ‎27/‎07/‎2016 15:34
>>>> To: "[hidden email]"
>>>> <[hidden email]>
>>>> Subject: Re: analyzing really big data sets
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>>> Post images on http://www.imgur.com and include the link in your posting.
>>>> *****
>>>>
>>>> I'll echo Paul's endorsement of Arivis for 3D data sets and George's
>>>> suggestion regarding Visiopharm for 2D data sets (I really love that
>>>> it doesn't duplicate the data into yet another proprietary file type).
>>>>
>>>>
>>>> However, theses are both expensive and there are open source options
>>>> as well.  One of our groups has a great open-source work flow for
>>>> imaging and registering cleared brains (imaged & registered >80
>>>> cleared brains, ~150TB of data). Here is the reference:
>>>>
>>>> http://hcbi.fas.harvard.edu/publications/dopamine-neurons-projecting-p
>>>> osterior-striatum-form-ananatomically-distinct
>>>> .
>>>> The Tessier-Lavigne lab just released a computational method (ClearMap
>>>> http://www.sciencedirect.com/science/article/pii/S0092867416305554 for
>>>> a similar process, as has the Ueda group with their CUBIC method (
>>>> http://www.nature.com/nprot/journal/v10/n11/full/nprot.2015.085.html),
>>>> although these both mainly deal with ultra-microscope data which isn't
>>>> as intensive as other forms of lightsheet.
>>>>
>>>> Big data viewer in Fiji and Vaa3D are also good open source options
>>>> for viewing the data.
>>>>
>>>> On the data storage side, the above mentioned publication was done
>>>> mainly with a filing cabinet full of 2TB USB 3.0 external hard drives.
>>>> Since then, we've run 10Gbit optical fiber to all of our microscopes
>>>> and workstations.  Most importantly, this 10Gbit connection goes right
>>>> through to our expandable storage server downtown.
>>>>
>>>> I think the two big lessons we've learned are the following:
>>>>
>>>> 1) Make sure your storage is expandable, you'll never have enough.
>>>> We're currently at 250TB in a LUSTER configuration with plans to push
>>>> into PTs soon.
>>>> 2) You will always need to move data, make sure your connections are fast.
>>>> We have a 3 tier system: 1) Microscope acquisition computer > 2)
>>>> Processing workstations > 3) Long-term storage server.  Connections to
>>>> the cloud are not fast enough, so I don't feel this is an option.
>>>>
>>>> Finally, many versions of commercial microscope acquisition software
>>>> are unable to directly save data to network storage (or external
>>>> drives) no matter how fast the connection. This is a feature we need
>>>> to push the manufacturers for or else you'll always be limited to the
>>>> storage space on your acquisition computer.
>>>>
>>>> -Doug
>>>>
>>>> On Wed, Jul 27, 2016 at 9:33 AM, Paul Paroutis
>>>> <[hidden email]>
>>>> wrote:
>>>>
>>>>> *****
>>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>>>> Post images on http://www.imgur.com and include the link in your
>>>> posting.
>>>>> *****
>>>>>
>>>>> We have been facing the same issue since the purchase of our Zeiss
>>>>> Lightsheet system. On the commercial side of things, Arivis has
>>>>> worked
>>>> well
>>>>> for us and I would recommend giving that a shot. On the
>>>>> deconvolution
>>>> side,
>>>>> we recently purchased the Huygens deconvolution module and it has
>>>>> given
>>>> us
>>>>> nice results. We had also tested the Microvolution software and were
>>>> really
>>>>> impressed at the speed and quality of deconvolution - the price tag
>>>>> put
>>>> it
>>>>> out of our range for the time being, but it's definitely worth exploring.
>>>
>>> +++Scanned for Viruses by ForcePoint+++
>>>
>>>
>>> ___________________________________________________________________________This e-mail is confidential and is for the addressee only.  Please refer to www.oxinst.com/email-statement for regulatory information.
>>
>>
>> Viele Gruesse,
>>
>>     Mario Emmenlauer
>>
>>
>> --
>> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
>> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
>> D-81669 München                          http://www.biodataanalysis.de/

--
Martin Wessendorf, Ph.D.                   office: (612) 626-0145
Assoc Prof, Dept Neuroscience                 lab: (612) 624-2991
University of Minnesota             Preferred FAX: (612) 624-8118
6-145 Jackson Hall, 321 Church St. SE    Dept Fax: (612) 626-5009
Minneapolis, MN  55455                    e-mail: [hidden email]
Tim Feinstein Tim Feinstein
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi folks,

HPC is the way to go but a high-performance cluster will be (WAY) more
than most regular sized imaging groups are ready to spend without major
grant support. In addition to materials you need dedicated staff to
support it.  In my last job our institute was just starting to get
hammered with increased demand from imaging (large volume spectral
datasets were the worst offenders for us), next-gen sequencing and
structural labs all at the same time, so we set up a committee to address
everyone's needs with one HPC.  We ended up instituting an expandable
system with image data access managed by OMERO and sequencing data handled
by openBIS.  Having structured data management tools is a really big deal,
for example to ensure that data from various grants are preserved for the
appropriate length of time and making it easy to 'pop out' and archive a
postdoc or student's work when they leave.  The needs of structure were so
ludicrous that they just kept working with international consortia.  All
other storage needs were basically a rounding error.

I think people will get the most traction if you find groups doing next
gen sequencing and other BIG DATA projects and develop a joint HPC plan.
You can configure some nodes for sequencing work (processor/RAM) and some
for imaging data (GPUs) but you still get a major economy of scale benefit
by working together; plus you have more leverage persuading the
administration to fund it.  In the future I think instrument grants like
the S10 should be changed to include more of an allowance for data
handling when appropriate; groups buying e.g. a Zeiss Z.1 for core use are
likely to discover another $50k in data costs unless they pass on the
storage/analysis burden to each user.

I will look into the Keller group's compression and analysis tools
(Efficient Processing and Analysis of Large-Scale Light Sheet Microscopy
Data, Nature Protocols 2013) when our light sheet activity ramps up more.

Best,


Tim

Timothy Feinstein, Ph.D.
Research Scientist
University of Pittsburgh Department of Developmental Biology





On 8/3/16, 9:31 AM, "Confocal Microscopy List on behalf of Martin
Wessendorf" <[hidden email] on behalf of
[hidden email]> wrote:

>*****
>To join, leave or search the confocal microscopy listserv, go to:
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.umn.
>edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40PITT.ED
>U%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c
>1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>Post images on
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.imgur.
>com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9e
>f9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fTV7aPW2Tvp
>7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>*****
>
>John--
>
>I realize that this isn't your system, but can you give us an idea of
>what a system like this can actually do--and, if it were scaled up (as
>it apparently can be), what it would be capable of?   The numbers sound
>big but don't mean much to a guy who still thinks four Gbyte of RAM is
>pretty good!
>
>Thanks!
>
>Martin Wessendorf
>
>
>
>
>On 8/3/2016 2:27 AM, John Oreopoulos wrote:
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>>
>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.umn
>>.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40PITT.
>>EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d
>>%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>> Post images on
>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.imgur
>>.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c
>>9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fTV7aPW2
>>Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>> *****
>>
>> In all my travels around the world installing microscope systems, one
>>facility stood out from the rest in terms of image computing power,
>>residing at the University if Texas South Western in Dallas. The
>>following is posted on behalf of Dr. Marcel Mettlen:
>>
>> All high-end microscopes (TIRF-, long-term-live-cell-, and light sheet
>>microscopes) are connected via 10Gb/s to our local computer cluster. Raw
>>data is transferred using FileZilla. Regarding hardware, we¹ve invested
>>a lot of thoughts/money into the right infrastructure (termed BioHPC),
>>which currently includes:
>> - A 4700 CPU HPC cluster
>> - 8 large memory GPU equipped nodes for massively parallel programs
>>using the nVidia CUDA toolkit (= 19,968 GPU cores!)
>> - 24.8Tb RAM
>> - 3 types of integrated storage systems with different internal data
>>transfer rates: (1) 240Tb @ 4GB/s, (2) 960Tb @ 6-8GB/s,  (3) 2500Tb @
>>30-40GB/s
>> - The whole system is designed for easy scale-up and operated via Red
>>Hat Enterprise Linux 6
>> - Connection to BioHPC is established using either Linux workstations,
>>desktop clients or virtual machine images (for access via other
>>operating systems). Alternatively, all resources are accessible via
>>web-based graphical user interfaces.
>>  
>> More info under:https://portal.biohpc.swmed.edu/content/about/
>>andhttps://portal.biohpc.swmed.edu/content/about/systems/
>>  
>> For visualization of large (3D) data sets, we¹ve tested several
>>packages such as Amira. Imaris is another good software.
>>  
>> For analysis and processing of large data files, we use custom Matlab
>>software. Whatever the language, one should take advantage of multicore
>>parallelization and have computing nodes with significant RAM
>>(currently, we use 256 Gb/node). An alternative is to rely on GPU.
>>  
>>  
>> Hope that helps,
>>  
>> Marcel
>>  
>> Assistant Professor and
>> Director of Research and Collaborations
>> Dept of Cell Biology, NL5.120GA
>> 6000 Harry Hines Blvd
>> Dallas, TX 75235-9039
>>
>> John Oreopoulos
>>
>>
>>
>>> On Aug 2, 2016, at 4:45 PM, Mario Emmenlauer <[hidden email]>
>>>wrote:
>>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>>
>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.um
>>>n.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40PIT
>>>T.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a526112f
>>>d0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>> Post images on
>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.imgu
>>>r.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%
>>>7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fTV7a
>>>PW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>>> *****
>>>
>>> Hi,
>>>
>>>> On 02.08.2016 19:25, Andreas Bruckbauer wrote:
>>>>
>>>> Dear Dennis,
>>>>
>>>> I still don't get it, Remko wrote that it takes at least 500s based
>>>>on the SSD transfer rate to load the 1 TB data set and more typically
>>>>30 mins. How could you load it in 60s? Please share the details.
>>> There is no real magic here :) The "trick" is that the file contains
>>> multiple downscaled versions of the dataset, a so-called resolution
>>> pyramid. Several modern file formats now have this feature, even though
>>> I think Imaris was one of the early adopters.
>>>
>>> In any case, this feature is mostly helpful for fast opening and
>>>viewing
>>> of the dataset. Several file formats combine this with a feature for
>>> chunking, to load only certain "blocks" of the higher resolutions into
>>> memory, as needed. This way, when you zoom in to any detail, only those
>>> chunks that encompass the close-up region are loaded in high
>>>resolution,
>>> which saves memory and loading time.
>>>
>>> For the processing, it depends on the algorithm whether chunking can be
>>> used or not. Certainly not all software, but also not all algorithms
>>>can
>>> profit from chunking, because a transformation may require the full
>>> dataset. If your image analysis software does supports chunking, and
>>> the algorithm can work block-wise, then it can run with very low memory
>>> requirements. In good cases you may be able to process virtually any
>>> size datasets with just enough RAM for a few blocks, like a few MB of
>>> RAM! Of course, eventually the full file needs to be loaded at a high
>>> resolution to process it at this resolution.
>>>
>>> To combine the features "resolution pyramid" and "chunking", lets say
>>> your image resolution is much higher than the level of detail you want
>>> to process. In this case, the image processing software may be clever
>>> enough to load the lowest resolution that is sufficient to perform the
>>> task! I.e. if you want to detect nuclei that are many times the size
>>> of your image resolution, it might be sufficient to already load the 2x
>>> or 4x or maybe even 8x lower-resolution version of the image from the
>>> resolution pyramid, and perform the detection on there. In combination
>>> with chunking, this will be several times faster and more memory
>>> efficient.
>>>
>>> But whether your software will do that for a certain task, depends on
>>> the software, the file format, and last not least on the processing
>>> you want to perform! So you would need to check this very precisely
>>> for your application.
>>>
>>> Cheers,
>>>
>>>    Mario
>>>
>>>
>>>
>>>> I will have a look at the file converter, but anyhow you have to
>>>>first load the file to convert it, so it does not change much, maybe
>>>>this runs as batch process. We are still trapped in this endless cycle
>>>>of saving files, transferring them to storage, transferring them again
>>>>to another machine (with fast SSDs) , potentially converting them to
>>>>another format! Then loading them and analysing... Pretty mad.
>>>> I read today that the next generation of video games renders whole
>>>>exploding cities (pretty mad too) in the cloud and transfers the data
>>>>to gaming consoles or mobile phones, why can we not do this? Ok, it is
>>>>the opposite problem, they have the vector data and want to get pixels
>>>>out, whereas we have the pixels and want to get objects.   but it
>>>>should be easier to cater for a few science labs than for the world
>>>>gaming community.
>>>>
>>>> Best wishes
>>>>
>>>> Andreas
>>>>
>>>> -----Original Message-----
>>>> From: "DENNIS Andrew" <[hidden email]>
>>>> Sent: 02/08/2016 09:35
>>>> To: "[hidden email]"
>>>><[hidden email]>
>>>> Subject: Re: analyzing really big data sets
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.u
>>>>mn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40P
>>>>ITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5261
>>>>12fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>> Post images on
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.img
>>>>ur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1
>>>>c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fT
>>>>V7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>>>> *****
>>>>
>>>> Hi Andreas,
>>>>
>>>> I was working with an IMS file, so it took about 60 seconds to load
>>>>the 1.2TB file and its ready for analysis straight away, I did some
>>>>spots analysis and tracking.
>>>>
>>>> I think you are using an non-native format, which is loading and
>>>>previewing the data and analysis isn¹t possible until the full file
>>>>conversion is complete.
>>>>
>>>> I¹d very much recommend converting the file to IMS format using the
>>>>file converter and then loading into Imaris, it¹s definitely a better
>>>>experience. The file converter doesn't need an Imaris licence and can
>>>>be run on a separate PC if you want.
>>>>
>>>> You mentioned the image Database (ŒArena¹), we¹ve got plenty of
>>>>feedback from big data users, and as a result Arena was made an
>>>>optional part of the Imaris install, so you can now run Imaris without
>>>>Arena.
>>>>
>>>> I hope this is helpful,
>>>>
>>>> Andrew
>>>>
>>>> -----Original Message-----
>>>> From: Confocal Microscopy List
>>>>[mailto:[hidden email]] On Behalf Of Andreas
>>>>Bruckbauer
>>>> Sent: 30 July 2016 08:40
>>>> To: [hidden email]
>>>> Subject: Re: analyzing really big data sets
>>>>
>>>> EXTERNAL EMAIL
>>>>
>>>> ATTACHMENT ADVISORY
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.u
>>>>mn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40P
>>>>ITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5261
>>>>12fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>> Post images on
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.img
>>>>ur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1
>>>>c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fT
>>>>V7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>>>> *****
>>>>
>>>> Dear Andrew,
>>>>
>>>> Are you sure you loaded the whole data set in 60s? My experience with
>>>>Imaris is that it quickly displays a part of the data set but when you
>>>>want to do any meaningful analysis (like tracking cells) it really
>>>>tries to load the full dataset into memory. To analyse data sets of
>>>>10-20 GB we need a workstation with 128 GB RAM while Arivis works with
>>>>very little RAM.  As I understand we are here talking about 100GB -
>>>>5TB, so loading the full dataset is wholly unpractical. Maybe
>>>>something changed in recent versions of Imaris? I stopped updating
>>>>since Imaris introduced this ridiculous database which fills up the
>>>>local hard disk. What about using Omero instead?
>>>>
>>>> Best wishes
>>>>
>>>> Andreas
>>>>
>>>> -----Original Message-----
>>>> From: "DENNIS Andrew" <[hidden email]>
>>>> Sent: 29/07/2016 23:39
>>>> To: "[hidden email]"
>>>><[hidden email]>
>>>> Subject: Re: analyzing really big data sets
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.u
>>>>mn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40P
>>>>ITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5261
>>>>12fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>> Post images on
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.img
>>>>ur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1
>>>>c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fT
>>>>V7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>>>> *****
>>>>
>>>> Sorry Typo in an embarrassing part of my last message,
>>>>
>>>> It should have said " today I loaded a 1.2TB data set, it took about
>>>>60 seconds. "
>>>>
>>>> -----Original Message-----
>>>> From: DENNIS Andrew
>>>> Sent: 29 July 2016 23:17
>>>> To: Confocal Microscopy List <[hidden email]>
>>>> Subject: RE: analyzing really big data sets
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.u
>>>>mn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40P
>>>>ITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5261
>>>>12fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>> Post images on
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.img
>>>>ur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1
>>>>c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fT
>>>>V7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>>>> *****
>>>>
>>>> Hi Esteban,
>>>>
>>>> I work at Andor/Bitpane so you may consider this to be a commercial
>>>>response..
>>>>
>>>> I'm interested in your comment on Imaris, today I loaded a 1.2GB data
>>>>set, it took about 60 seconds. When you refer to Big data, what sizes
>>>>are you talking about?
>>>>
>>>> Andrew
>>>>
>>>>
>>>> ________________________________________
>>>> From: Confocal Microscopy List [[hidden email]] on
>>>>behalf of G. Esteban Fernandez [[hidden email]]
>>>> Sent: 29 July 2016 20:43
>>>> To: [hidden email]
>>>> Subject: Re: analyzing really big data sets
>>>>
>>>> EXTERNAL EMAIL
>>>>
>>>> ATTACHMENT ADVISORY
>>>>
>>>> *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.u
>>>>mn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40P
>>>>ITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5261
>>>>12fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>> Post images on
>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.img
>>>>ur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1
>>>>c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%2fT
>>>>V7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your posting.
>>>> *****
>>>>
>>>> I just wanted to echo the praises for Arivis and MicroVolution.
>>>>
>>>> My favorite for 3D work is Imaris but it can't handle big data; when
>>>>possible I downsample large datasets and work in Imaris. Amira can
>>>>handle big data but in my experience it crashed more often than Arivis
>>>>plus I prefer the user interface in Arivis.
>>>>
>>>> MicroVolution results were comparable to Huygens and AutoQuant in my
>>>>hands (qualitatively, I didn't do rigorous quantitative comparisons)
>>>>in about
>>>> 1/60 of the time with a lower end GPU. I mostly looked at confocal
>>>>point-scanning data and didn't try truly big data. MicroVolution is
>>>>limited to datasets <RAM, so you subvolume yourself before
>>>>deconvolving.
>>>>
>>>> -Esteban
>>>>
>>>>> On Jul 27, 2016 10:03 AM, "Andreas Bruckbauer" <
>>>>>[hidden email]> wrote:
>>>>>
>>>>> *****
>>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>>
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.
>>>>>umn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%4
>>>>>0PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5
>>>>>26112fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>>> Post images on
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.im
>>>>>gur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32
>>>>>f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%
>>>>>2fTV7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your
>>>>>posting.
>>>>> *****
>>>>>
>>>>> Would it not be much better to perform the data analysis on a
>>>>>scalable
>>>>> cluster which has fast connection to the storage instead of moving
>>>>> data around? We need to push software companies to make their
>>>>> solutions run on these machines. Instead of buying ever bigger
>>>>> analysis workstations which are obsolete after a few years, one would
>>>>> just buy computing time. The cluster can be shared with
>>>>>bioinformatics groups.
>>>>>
>>>>> My take on storage is that you need to have a cheap archive,
>>>>>otherwise
>>>>> there will be a point at which you run out of money to keep the ever
>>>>> expanding storage.
>>>>>
>>>>> Best wishes
>>>>>
>>>>> Andreas
>>>>>
>>>>> -----Original Message-----
>>>>> From: "Douglas Richardson" <[hidden email]>
>>>>> Sent: 27/07/2016 15:34
>>>>> To: "[hidden email]"
>>>>> <[hidden email]>
>>>>> Subject: Re: analyzing really big data sets
>>>>>
>>>>> *****
>>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>>
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.
>>>>>umn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%4
>>>>>0PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc3a5
>>>>>26112fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXoY%3d
>>>>> Post images on
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.im
>>>>>gur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32
>>>>>f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpPT0%
>>>>>2fTV7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your
>>>>>posting.
>>>>> *****
>>>>>
>>>>> I'll echo Paul's endorsement of Arivis for 3D data sets and George's
>>>>> suggestion regarding Visiopharm for 2D data sets (I really love that
>>>>> it doesn't duplicate the data into yet another proprietary file
>>>>>type).
>>>>>
>>>>>
>>>>> However, theses are both expensive and there are open source options
>>>>> as well.  One of our groups has a great open-source work flow for
>>>>> imaging and registering cleared brains (imaged & registered >80
>>>>> cleared brains, ~150TB of data). Here is the reference:
>>>>>
>>>>>
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fhcbi.f
>>>>>as.harvard.edu%2fpublications%2fdopamine-neurons-projecting-p&data=01%
>>>>>7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0
>>>>>a04eeb87cc3a526112fd0d%7c1&sdata=aDZCWFpr8Ejcf759Ftr0lNfKEpphZO9RPx7is
>>>>>oAhQm8%3d
>>>>> osterior-striatum-form-ananatomically-distinct
>>>>> .
>>>>> The Tessier-Lavigne lab just released a computational method
>>>>>(ClearMap
>>>>>
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.sc
>>>>>iencedirect.com%2fscience%2farticle%2fpii%2fS0092867416305554&data=01%
>>>>>7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0
>>>>>a04eeb87cc3a526112fd0d%7c1&sdata=Gx8iEV7xLAs2vEN5mMQKRBs7OQPoWss5UVV%2
>>>>>bleb8N2A%3d for
>>>>> a similar process, as has the Ueda group with their CUBIC method (
>>>>>
>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.na
>>>>>ture.com%2fnprot%2fjournal%2fv10%2fn11%2ffull%2fnprot.2015.085.html&da
>>>>>ta=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9
>>>>>f489e0a04eeb87cc3a526112fd0d%7c1&sdata=v%2fXO%2ffCmLScMU3EHIsFF%2bKb5s
>>>>>f0I8RrIRi0oxqKftJ4%3d),
>>>>> although these both mainly deal with ultra-microscope data which
>>>>>isn't
>>>>> as intensive as other forms of lightsheet.
>>>>>
>>>>> Big data viewer in Fiji and Vaa3D are also good open source options
>>>>> for viewing the data.
>>>>>
>>>>> On the data storage side, the above mentioned publication was done
>>>>> mainly with a filing cabinet full of 2TB USB 3.0 external hard
>>>>>drives.
>>>>> Since then, we've run 10Gbit optical fiber to all of our microscopes
>>>>> and workstations.  Most importantly, this 10Gbit connection goes
>>>>>right
>>>>> through to our expandable storage server downtown.
>>>>>
>>>>> I think the two big lessons we've learned are the following:
>>>>>
>>>>> 1) Make sure your storage is expandable, you'll never have enough.
>>>>> We're currently at 250TB in a LUSTER configuration with plans to push
>>>>> into PTs soon.
>>>>> 2) You will always need to move data, make sure your connections are
>>>>>fast.
>>>>> We have a 3 tier system: 1) Microscope acquisition computer > 2)
>>>>> Processing workstations > 3) Long-term storage server.  Connections
>>>>>to
>>>>> the cloud are not fast enough, so I don't feel this is an option.
>>>>>
>>>>> Finally, many versions of commercial microscope acquisition software
>>>>> are unable to directly save data to network storage (or external
>>>>> drives) no matter how fast the connection. This is a feature we need
>>>>> to push the manufacturers for or else you'll always be limited to the
>>>>> storage space on your acquisition computer.
>>>>>
>>>>> -Doug
>>>>>
>>>>> On Wed, Jul 27, 2016 at 9:33 AM, Paul Paroutis
>>>>> <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> *****
>>>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>>>>
>>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists
>>>>>>.umn.edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8
>>>>>>%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba32f1c%7c9ef9f489e0a04eeb87cc
>>>>>>3a526112fd0d%7c1&sdata=tOTplBIohR%2bqHzcR389GW8R5k6T1L%2bKxpF4YXlzWXo
>>>>>>Y%3d
>>>>>> Post images on
>>>>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.i
>>>>>>mgur.com&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3bba
>>>>>>32f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Uy%2foAtS%2bcdqHpP
>>>>>>T0%2fTV7aPW2Tvp7jr64%2b4X2j7%2fxdYY%3d and include the link in your
>>>>> posting.
>>>>>> *****
>>>>>>
>>>>>> We have been facing the same issue since the purchase of our Zeiss
>>>>>> Lightsheet system. On the commercial side of things, Arivis has
>>>>>> worked
>>>>> well
>>>>>> for us and I would recommend giving that a shot. On the
>>>>>> deconvolution
>>>>> side,
>>>>>> we recently purchased the Huygens deconvolution module and it has
>>>>>> given
>>>>> us
>>>>>> nice results. We had also tested the Microvolution software and were
>>>>> really
>>>>>> impressed at the speed and quality of deconvolution - the price tag
>>>>>> put
>>>>> it
>>>>>> out of our range for the time being, but it's definitely worth
>>>>>>exploring.
>>>>
>>>> +++Scanned for Viruses by ForcePoint+++
>>>>
>>>>
>>>>
>>>>_______________________________________________________________________
>>>>____This e-mail is confidential and is for the addressee only.  Please
>>>>refer to
>>>>https://na01.safelinks.protection.outlook.com/?url=www.oxinst.com%2fema
>>>>il-statement&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae08d3b
>>>>ba32f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=tLug0GcFPEzcM5zbfY
>>>>%2bKtZkYKZgrynoJZ9pEz3cx%2bl0%3d for regulatory information.
>>>
>>>
>>> Viele Gruesse,
>>>
>>>     Mario Emmenlauer
>>>
>>>
>>> --
>>> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
>>> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
>>> D-81669 München
>>>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.biod
>>>ataanalysis.de%2f&data=01%7c01%7ctnf8%40PITT.EDU%7ccf9ef585d2dc46b18bae0
>>>8d3bba32f1c%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=Vw1H7ybSwk4xrPD
>>>KX4qnjxfPYdeO9xVqdg%2f04mKS%2fZ4%3d
>
>--
>Martin Wessendorf, Ph.D.                   office: (612) 626-0145
>Assoc Prof, Dept Neuroscience                 lab: (612) 624-2991
>University of Minnesota             Preferred FAX: (612) 624-8118
>6-145 Jackson Hall, 321 Church St. SE    Dept Fax: (612) 626-5009
>Minneapolis, MN  55455                    e-mail: [hidden email]

Kate Luby-Phelps Kate Luby-Phelps
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

In reply to this post by Watkins, Simon C-2
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

To clarify the post by John Oreopolous, The BioHPC at UTSouthwestern is the brainchild of Gaudenz Danuser and was initially a collaboration of three departments who pooled resources to purchase the first nodes of the cluster. It continues to evolve and is now on track to be the HPC resource for the whole research community at Southwestern. The website is https://biohpc.swmed.edu/content/. The director of the facility is Dr. Liquang Wang. So far the HPC has done a lot to alleviate bottlenecks for the transfer and storage of big data. And for those with access to software engineers to write custom algorithms for specific tasks that are designed for distributed computing on Linux systems, it has been really enabling. Recently they have been expanding their GPU capabilities as well. From my point of view as the director of the institutional imaging core facilities, it hasnt yet been a game changer in terms of analysis of big data sets because the all purpose biological image processing packages generally are not written to take advantage of cluster or gpu computing. I realize this too is evolving in the right direction but not fast enough. Also there are still hardware incompatibility issues with some of the gpu based applications that I'm sure will eventually get sorted out. Meanwhile we still use imaris and autoquant on honking big workstations.
Mario Emmenlauer Mario Emmenlauer
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

In reply to this post by Tim Feinstein
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

On 03.08.2016 16:59, Feinstein, Timothy N wrote:
> Hi folks,
>
> I will look into the Keller group's compression and analysis tools
> (Efficient Processing and Analysis of Large-Scale Light Sheet Microscopy
> Data, Nature Protocols 2013) when our light sheet activity ramps up more.

In my very humble opinion, the Keller group's compression seemed a bit
overrated. When I checked their released source code end of 2015, the
implemented compressions where basically gzip and bzip2, both of which
go back to the childhood days of computers. They tend to have an inferior
compression compared to more "modern" formats. I appreciate that the
Keller group promotes the use of parallelization which makes better use
of modern CPU's, but for gzip and bzip2 this is also not really new.
Basically, you could get the same thing in an efficient, well-established
file format since many years, see HDF5 or NetCDF. NASA is only one of
their large user base, as is Imaris. HDF5 is already supported by many
readers including Fiji, and it has great support for advanced features
like linking multiple files (to chain time series on the fly), transparent
compression, loading of sub-regions of the file ("chunking"), and many
other goodies. I would love if an open, HDF5 based file format would
become the microscopy standard that's been missing since forever :-)

But that's really just my two cents.

Cheers,

    Mario




> Best,
>
>
> Tim
>
> Timothy Feinstein, Ph.D.
> Research Scientist
> University of Pittsburgh Department of Developmental Biology



Viele Gruesse,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/
Tim Feinstein Tim Feinstein
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Mario,

I appreciate your point about the compression not being terribly advanced.
 The thing that got my attention was discriminating between 'foreground'
and 'background' aspects of the image.  This is potentially a very big
deal for me.  One of my collaborators is using the light sheet to make a
'globe' of labeling on the surface of an optically opaque Xenopus embryo.
In this case the data is essentially a two-dimensional 'skin' that
represents the surface of a spheroid; discarding background would reduce
those (gigantic) file sizes by about 90%.  Is this commonly done using
existing compression techniques?  It is possible that some software
already does this; Imaris is an expensive package so I am mostly using
freeware like Fiji plugins and Terastitcher at the moment.

All the best,


Tim

Timothy Feinstein, Ph.D.
Research Scientist
University of Pittsburgh Department of Developmental Biology





On 8/4/16, 8:10 PM, "Confocal Microscopy List on behalf of Mario
Emmenlauer" <[hidden email] on behalf of
[hidden email]> wrote:

>*****
>To join, leave or search the confocal microscopy listserv, go to:
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.umn.
>edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40PITT.ED
>U%7cbe5b71baa419433bb72908d3bcc6cee6%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c
>1&sdata=EXkrlSCQifZE2%2fS4gbxIvj2T63C2xvyfxfyGeI4dU5k%3d
>Post images on
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.imgur.
>com&data=01%7c01%7ctnf8%40PITT.EDU%7cbe5b71baa419433bb72908d3bcc6cee6%7c9e
>f9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=%2fRNHfiPP69XUtO1fVKhxPifV7brWnIx
>xYDEsHJJ%2b2Ew%3d and include the link in your posting.
>*****
>
>On 03.08.2016 16:59, Feinstein, Timothy N wrote:
>> Hi folks,
>>
>> I will look into the Keller group's compression and analysis tools
>> (Efficient Processing and Analysis of Large-Scale Light Sheet Microscopy
>> Data, Nature Protocols 2013) when our light sheet activity ramps up
>>more.
>
>In my very humble opinion, the Keller group's compression seemed a bit
>overrated. When I checked their released source code end of 2015, the
>implemented compressions where basically gzip and bzip2, both of which
>go back to the childhood days of computers. They tend to have an inferior
>compression compared to more "modern" formats. I appreciate that the
>Keller group promotes the use of parallelization which makes better use
>of modern CPU's, but for gzip and bzip2 this is also not really new.
>Basically, you could get the same thing in an efficient, well-established
>file format since many years, see HDF5 or NetCDF. NASA is only one of
>their large user base, as is Imaris. HDF5 is already supported by many
>readers including Fiji, and it has great support for advanced features
>like linking multiple files (to chain time series on the fly), transparent
>compression, loading of sub-regions of the file ("chunking"), and many
>other goodies. I would love if an open, HDF5 based file format would
>become the microscopy standard that's been missing since forever :-)
>
>But that's really just my two cents.
>
>Cheers,
>
>    Mario
>
>
>
>
>> Best,
>>
>>
>> Tim
>>
>> Timothy Feinstein, Ph.D.
>> Research Scientist
>> University of Pittsburgh Department of Developmental Biology
>
>
>
>Viele Gruesse,
>
>    Mario Emmenlauer
>
>
>--
>BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
>Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
>D-81669 München  
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.biodat
>aanalysis.de%2f&data=01%7c01%7ctnf8%40PITT.EDU%7cbe5b71baa419433bb72908d3b
>cc6cee6%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=WGxEooL1Zqd4%2fKWGHvA
>P7jJjebJO1PLkBd6EZ2HGlJ4%3d
Guy Cox-2 Guy Cox-2
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

What you need is a program that does surface, rather than volume, rendering.  Extracting the surface from such a data set should be easy, and once you've done that you have all your information in something more like 1% than 10% of your original file size.  What is more, that approach can use the hardware tricks for polygon rendering included in graphic cards intended for the gaming market.  However, I've been retired too long to be able to recommend a specific program.

          Guy

Guy Cox, Honorary Associate Professor
School of Medical Sciences

Australian Centre for Microscopy and Microanalysis,
Madsen, F09, University of Sydney, NSW 2006

-----Original Message-----
From: Confocal Microscopy List [mailto:[hidden email]] On Behalf Of Feinstein, Timothy N
Sent: Saturday, 6 August 2016 5:30 AM
To: [hidden email]
Subject: Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Mario,

I appreciate your point about the compression not being terribly advanced.
 The thing that got my attention was discriminating between 'foreground'
and 'background' aspects of the image.  This is potentially a very big deal for me.  One of my collaborators is using the light sheet to make a 'globe' of labeling on the surface of an optically opaque Xenopus embryo.
In this case the data is essentially a two-dimensional 'skin' that represents the surface of a spheroid; discarding background would reduce those (gigantic) file sizes by about 90%.  Is this commonly done using existing compression techniques?  It is possible that some software already does this; Imaris is an expensive package so I am mostly using freeware like Fiji plugins and Terastitcher at the moment.

All the best,


Tim

Timothy Feinstein, Ph.D.
Research Scientist
University of Pittsburgh Department of Developmental Biology





On 8/4/16, 8:10 PM, "Confocal Microscopy List on behalf of Mario Emmenlauer" <[hidden email] on behalf of [hidden email]> wrote:

>*****
>To join, leave or search the confocal microscopy listserv, go to:
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.umn.
>edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40PITT
>.ED
>U%7cbe5b71baa419433bb72908d3bcc6cee6%7c9ef9f489e0a04eeb87cc3a526112fd0d
>%7c 1&sdata=EXkrlSCQifZE2%2fS4gbxIvj2T63C2xvyfxfyGeI4dU5k%3d
>Post images on
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.imgur.
>com&data=01%7c01%7ctnf8%40PITT.EDU%7cbe5b71baa419433bb72908d3bcc6cee6%7
>c9e
>f9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=%2fRNHfiPP69XUtO1fVKhxPifV7brW
>nIx xYDEsHJJ%2b2Ew%3d and include the link in your posting.
>*****
>
>On 03.08.2016 16:59, Feinstein, Timothy N wrote:
>> Hi folks,
>>
>> I will look into the Keller group's compression and analysis tools  
>>(Efficient Processing and Analysis of Large-Scale Light Sheet
>>Microscopy  Data, Nature Protocols 2013) when our light sheet activity
>>ramps up more.
>
>In my very humble opinion, the Keller group's compression seemed a bit
>overrated. When I checked their released source code end of 2015, the
>implemented compressions where basically gzip and bzip2, both of which
>go back to the childhood days of computers. They tend to have an
>inferior compression compared to more "modern" formats. I appreciate
>that the Keller group promotes the use of parallelization which makes
>better use of modern CPU's, but for gzip and bzip2 this is also not really new.
>Basically, you could get the same thing in an efficient,
>well-established file format since many years, see HDF5 or NetCDF. NASA
>is only one of their large user base, as is Imaris. HDF5 is already
>supported by many readers including Fiji, and it has great support for
>advanced features like linking multiple files (to chain time series on
>the fly), transparent compression, loading of sub-regions of the file
>("chunking"), and many other goodies. I would love if an open, HDF5
>based file format would become the microscopy standard that's been
>missing since forever :-)
>
>But that's really just my two cents.
>
>Cheers,
>
>    Mario
>
>
>
>
>> Best,
>>
>>
>> Tim
>>
>> Timothy Feinstein, Ph.D.
>> Research Scientist
>> University of Pittsburgh Department of Developmental Biology
>
>
>
>Viele Gruesse,
>
>    Mario Emmenlauer
>
>
>--
>BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
>Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
>D-81669 München  
>https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.bio
>dat
>aanalysis.de%2f&data=01%7c01%7ctnf8%40PITT.EDU%7cbe5b71baa419433bb72908
>d3b
>cc6cee6%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=WGxEooL1Zqd4%2fKWG
>HvA
>P7jJjebJO1PLkBd6EZ2HGlJ4%3d
Mario Emmenlauer Mario Emmenlauer
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

In reply to this post by Tim Feinstein
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Tim,

On 05.08.2016 21:30, Feinstein, Timothy N wrote:

> I appreciate your point about the compression not being terribly advanced.
>  The thing that got my attention was discriminating between 'foreground'
> and 'background' aspects of the image.  This is potentially a very big
> deal for me.  One of my collaborators is using the light sheet to make a
> 'globe' of labeling on the surface of an optically opaque Xenopus embryo.
> In this case the data is essentially a two-dimensional 'skin' that
> represents the surface of a spheroid; discarding background would reduce
> those (gigantic) file sizes by about 90%.  Is this commonly done using
> existing compression techniques?  It is possible that some software
> already does this; Imaris is an expensive package so I am mostly using
> freeware like Fiji plugins and Terastitcher at the moment.

This is actually a very good point. We work with Zebrafish in one project
and its indeed 90% background. HDF5 has a very cool feature, that when
chunking is used, only those chunks occupy space that contain actual data.
So its possible to allocate a 10000x10000x10000 SPIM image in HDF5 that
takes zero space on the disk (a few kilobytes, to be fair). Only when you
start writing to certain locations of the file (where your foreground is)
then those blocks will use space (additionally you can use the several
compressions on the blocks). But HDF5 itself will not take a decision of
foreground/background. You would need to have a software that performs
some thresholding or a more clever operation to decide where foreground
is. So this is not readily implemented. However it seems relatively easy
to implement this in Fiji/ImageJ for example, directly in a file converter
as an option. Does the Keller group's software do something clever there,
or is it a simple thresholding? This would be not more than a days work,
I assume...

The coolest thing about HDF5 is not that this kind of fancy writing is
possible. The coolest thing is that the reader does not need to care. So
you can open a chunked or non-chunked, compressed or non-compressed etcetc
kind of image in all sorts of readers, as long as the required meta data
is present. Lets say for your Xenopus you only write the blocks that have
foreground, and you compress the blocks with bzip2, then in principle, it
should be trivial to open your fancy compressed image in i.e. Imaris,
BigDataViewer, or whatever software that has an HDF5 reader! So its quite
an "easy" format for the reader, which I think makes it powerful, because
most readers should easily be made compatible with most files. The only
prerequisite is that Imaris or BigDataViewer or whatever find their meta
data, but open source writers for these formats exist, so they would be
relatively easily adapted for such a use case.

Cheers,

    Mario





> All the best,
>
>
> Tim
>
> Timothy Feinstein, Ph.D.
> Research Scientist
> University of Pittsburgh Department of Developmental Biology
>
>
>
>
>
> On 8/4/16, 8:10 PM, "Confocal Microscopy List on behalf of Mario
> Emmenlauer" <[hidden email] on behalf of
> [hidden email]> wrote:
>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.umn.
>> edu%2fcgi-bin%2fwa%3fA0%3dconfocalmicroscopy&data=01%7c01%7ctnf8%40PITT.ED
>> U%7cbe5b71baa419433bb72908d3bcc6cee6%7c9ef9f489e0a04eeb87cc3a526112fd0d%7c
>> 1&sdata=EXkrlSCQifZE2%2fS4gbxIvj2T63C2xvyfxfyGeI4dU5k%3d
>> Post images on
>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.imgur.
>> com&data=01%7c01%7ctnf8%40PITT.EDU%7cbe5b71baa419433bb72908d3bcc6cee6%7c9e
>> f9f489e0a04eeb87cc3a526112fd0d%7c1&sdata=%2fRNHfiPP69XUtO1fVKhxPifV7brWnIx
>> xYDEsHJJ%2b2Ew%3d and include the link in your posting.
>> *****
>>
>> On 03.08.2016 16:59, Feinstein, Timothy N wrote:
>>> Hi folks,
>>>
>>> I will look into the Keller group's compression and analysis tools
>>> (Efficient Processing and Analysis of Large-Scale Light Sheet Microscopy
>>> Data, Nature Protocols 2013) when our light sheet activity ramps up
>>> more.
>>
>> In my very humble opinion, the Keller group's compression seemed a bit
>> overrated. When I checked their released source code end of 2015, the
>> implemented compressions where basically gzip and bzip2, both of which
>> go back to the childhood days of computers. They tend to have an inferior
>> compression compared to more "modern" formats. I appreciate that the
>> Keller group promotes the use of parallelization which makes better use
>> of modern CPU's, but for gzip and bzip2 this is also not really new.
>> Basically, you could get the same thing in an efficient, well-established
>> file format since many years, see HDF5 or NetCDF. NASA is only one of
>> their large user base, as is Imaris. HDF5 is already supported by many
>> readers including Fiji, and it has great support for advanced features
>> like linking multiple files (to chain time series on the fly), transparent
>> compression, loading of sub-regions of the file ("chunking"), and many
>> other goodies. I would love if an open, HDF5 based file format would
>> become the microscopy standard that's been missing since forever :-)
>>
>> But that's really just my two cents.
>>
>> Cheers,
>>
>>    Mario



Viele Gruesse,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/
Neil Anthony Neil Anthony
Reply | Threaded
Open this post in threaded view
|

Re: analyzing really big data sets

In reply to this post by Mario Emmenlauer
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi guys, great topic!

I've been looking into the option of handling large data sets with FEI Amira and arivis Vision 4D.  We currently have Imaris but find it hangs/crashes on analysing med sized data sets on a reasonably capable workstation (10 core E5-2650 v3 CPU, 64GB RAM, 4GB K4200 GPU).  I think it might be the single threaded analyses that slow it down.  It's been locking up on calculating tracking stats recently and I feel there's a limit to the number of faces on the surfaces maybe, as I can get it to work ok on a certain sized ROI above which it really struggles.  I will say that the file convertor is a great idea; you can leave it to run everything in a batch, and it doesn't need to open the files, it just converts them.  Then the HDF5 thing speeds things up for opening/viewing.

I think the major points are those made by Mario regarding chunks, pyramids and analysis on HDF5 data.  Does anybody know more about Amira and Vision4D and their implementation of chunks in the analysis?

I understand from conversations with FEI and arivis that segmentation and tracking are single threaded (and empirically using Imaris too), but I've always wondered why...  If there was just one thing to track or one intensity maxima to segment I'd understand, but surely there's a way to segment or track using chunks and then work out afterwards the cases where things overlap...?

[quick detour....]
My personal hope is that the future comes a little sooner with RAM and data storage, i.e. XPoint memory!  It won't be long before we have PB storage in the workstation that serves as the RAM and the drive at the same time.  One can envision a reversal of the motherboard layout in these crazy number cases, where the RAM is in the middle and the CPUs etc are on the outside.  Place the PB XPoint in the middle and surround it with multicore CPU/GPUs on every side available, 2 units on each axis.   I think that ought to do it, don't you ;)

Thanks
Neil


-----Original Message-----
From: Confocal Microscopy List [mailto:[hidden email]] On Behalf Of Mario Emmenlauer
Sent: Thursday, August 04, 2016 8:10 PM
To: [hidden email]
Subject: Re: analyzing really big data sets

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

On 03.08.2016 16:59, Feinstein, Timothy N wrote:
> Hi folks,
>
> I will look into the Keller group's compression and analysis tools
> (Efficient Processing and Analysis of Large-Scale Light Sheet
> Microscopy Data, Nature Protocols 2013) when our light sheet activity ramps up more.

In my very humble opinion, the Keller group's compression seemed a bit overrated. When I checked their released source code end of 2015, the implemented compressions where basically gzip and bzip2, both of which go back to the childhood days of computers. They tend to have an inferior compression compared to more "modern" formats. I appreciate that the Keller group promotes the use of parallelization which makes better use of modern CPU's, but for gzip and bzip2 this is also not really new.
Basically, you could get the same thing in an efficient, well-established file format since many years, see HDF5 or NetCDF. NASA is only one of their large user base, as is Imaris. HDF5 is already supported by many readers including Fiji, and it has great support for advanced features like linking multiple files (to chain time series on the fly), transparent compression, loading of sub-regions of the file ("chunking"), and many other goodies. I would love if an open, HDF5 based file format would become the microscopy standard that's been missing since forever :-)

But that's really just my two cents.

Cheers,

    Mario




> Best,
>
>
> Tim
>
> Timothy Feinstein, Ph.D.
> Research Scientist
> University of Pittsburgh Department of Developmental Biology



Viele Gruesse,

    Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
D-81669 München                          http://www.biodataanalysis.de/

________________________________

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).
12