Confocal Microscopy List

Re: GPU-based deconvolution

Posted by George McNamara on
URL: http://confocal-microscopy-list.275.s1.nabble.com/GPU-based-deconvolution-tp7582498p7582561.html

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Brian,

Thanks for your explanation of the FFT and cuFFT, reminder of the EPFL
test suite (which Bruce and Butte used in their paper), and especially
that Bob Pepin has posted 'yet another' GPU deconvolution. You gave the
library (source code) link - that web page has the URL for the comiled
programs, https://github.com/bobpepin/YacuDecu/releases .. ex. to run
with MatLab Runtime library, a'la CellProfiler. It would be great if
CellProfiler could incorporate both the Pepin GPU deconvolution (and
maybe get some tips from UNC Clarity library and COSM?) and Adam Hoppe's
3DFSR (61 lines of Matlab, 10x SNR improvement).

With respect to comparisons, in summer of 2013 our lab hosted Ms. Vinita
Popat, and undergraduate at Cornell Univ, who used (part of) the EPFL
test suite on many different algorithms, and some of our microscopy
data, and was able to get a temporary license from SVI Huygens. Bruce &
Butte's GPU deconvolution was much faster than everything else, and for
the most part, gave comparable results to Huygens. Vinita's results are
why I have focused on B&B's deconvolution. I had previously rarely used
Zeiss Axiovision's deconvolution in the core I ran in Miami - sometimes
nice results, slow, bad user interface (deconvolution should really by
'headless', to use the ImageJ2 term: no user interface). In particular,
Zeiss's "most' quantitative mode was painfully slow and featured a
slider that ranged from (something like) "less" to "more" (the Zeiss
widefield and confocal folks for Miami recommended about halfway between
the midpoint to far right). I believe Huygens is 'under the hood' of the
Leica LAS AF confocal software for the SP5 and SP8 - I've used it:
unnecessary user interface stuff, "batch" has never worked for me, user
is left to guess what settings to use, even though the instrument just
acquired the data and 'knows' the pixel size, Z step size, wavelengths.
The Applied Precision/GE Healthcare deconvolution mode in OMX is
"almost" real-time, but using it for deconvolution is 3x more expensive
(and smaller field of view) than a Deltavision system (Deltavision does
have a price and quantitation advantage over 'baby' point scanning
confocals). OMX deconvolution "almost' real time speed is in part
because they have 512x512 pixel EMCCD camera(s), whereas I am using a
FLASH4.0 with 2048x2048 pixels.

Sincerely,

George
p.s. another item with respect to SVI Huygens: they introduced "Titan"
several months ago, but instead of using that for NVidia TITAN card
deconvolution, it is their name of a database thingy. Duh.

On 8/22/2014 7:23 AM, Brian Northan wrote:

> Hi George
>
> "I checked again, and my hunch was correct: it's a fourier
> transform artefact. If you want to get around that from occurring,
> pad to 36 slices with black. Powers of 2, 3, 5, 7 (and
> combinations of those) remain speedy without taking up too much
> extra ram. I didn't have any memory metrics reported in that
> ImageJ plugin--too many synchronization points from java to the gpu."
>
>
> 1. FFT libraries often come with a utility that returns the "nearest"
> fast size for padding but last I used Cuda NVIDIA did not have this.
> Older versions of cufft were much faster at powers of 2 as compared to
> fftw. By this I mean cufft was ~10x faster at power 2, but only
> slightly faster at non-power of 2. This may have been improved in the
> latest version of CUDA.
>
> 2. It would be important to do a very rigorous comparison of Bruce
> and Butt results to other algorithms. I know their algorithm is based
> on RL but I'm not sure what type of regularization and acceleration
> their algorithm uses, this could change the results quite a bit.
> Would be interesting to see a comparision to EPFL (Big Lab)
> algorithms, Huygens, Cell Sens, Zeiss, Autoquant.
>
> 3. Padding is tricky. Padding by zero can lead to artifacts as
> well. So a more subtle strategy to deal with edge artifacts is
> needed. EPFL lab ran the last "deconvolution challenge" and
> recomended using non-circulant convolution model
> (http://bigwww.epfl.ch/deconvolution/challenge/index.html?p=documentation/theory/forwardmodel,
> Ask for their Matlab code as it makes it clearer what they are
> doing). However their strategy can lead to large image buffers and
> reduced speed (depending on the extent of the PSF). But there are
> other strategies to deal with edges. David Biggs has a section about
> it in his thesis
> (https://researchspace.auckland.ac.nz/handle/2292/1760) (15+ years
> old, but a lot of good information)
>
> 4. There is an open version of GPU deconvolution by Bob Pepin.
> https://github.com/bobpepin/YacuDecu. I believe it is the standard
> Richardson Lucy, but could be a good starting point for something
> else. There are wrappers for Matlab and Imaris.
>
> Brian
>
>
>
>
>
>
> On Thu, Aug 21, 2014 at 10:27 PM, George McNamara
> <[hidden email] <mailto:[hidden email]>> wrote:
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your
> posting.
> *****
>
> Dear Confocal Listserv,
>
> With respect to my previous post about Bruce & Butte's NVidia CUDA
> GPU deconvolution bug, I exchanged emailes with Dr. Marc Bruce and
> Prof. Manish Butte, and spoke with Manish today. Here's what I
> learned.
>
> There is a workaround: Padding the acquisition with several extra
> image planes at the beginning does work - per Marc Bruce's
> suggestion below. In fact, on Wednesday I acquired a 32 plane
> Z-series of a novel specimen, in which I started even further in
> the coverglass than usual (63x, 1.4NA lens, 158 nm XY pixel size,
> 300 nm Z-steps, 20 ms exposure time for the Alexa Fluor 568
> immunostaining). B&B's deconvolution took under 5 seconds (each
> channel) for 1024x1024 pixels, 32 planes. TITAN Z The first 8
> planes exhibited the bug, I deleted them once I had the data back
> in MetaMorph. The planes with the cell data was in one word:
> spectacular. To give credit where credit is due: my colleague
> whose experiment we were imaging had 'nailed the landing' on
> experiment design and implementation (I was just the
> microphotographer). It would be great if GPU (and/or Xeron/Xeon
> Phi/Knights Landing, see earlier posts) could give us "instant
> gratification" clean-up of our confocal and widefield images (more
> on this later).
>
> Both Marc (message below) and Manish pointed out that for the CUDA
> GPU deconvolution (and maybe FFT's in general???),
> "Powers of 2, 3, 5, 7 (and combinations of those) remain speedy" -
> which I was surprised at, even though I've been dealing with FFT's
> for a long time since converting (Fortran to Turbo Pascal) and
> extended (mean harmonics of several shapes then inverse FFT with
> select harmonics) Rohlf & Archie's Elliptical Fourier Analysis of
> shapes, http://sysbio.oxfordjournals.org/content/33/3/302
> This is the basis for the EFA available in MetaMorph's Integrated
> Morphometry Analysis ("IMA)" / Morphometry module (turn on the
> option in Preferences to get access to the parameters).
>
> Quoting Marc (newer message in first paragraph):
>
> "I checked again, and my hunch was correct: it's a fourier
> transform artefact. If you want to get around that from occurring,
> pad to 36 slices with black. Powers of 2, 3, 5, 7 (and
> combinations of those) remain speedy without taking up too much
> extra ram. I didn't have any memory metrics reported in that
> ImageJ plugin--too many synchronization points from java to the gpu."
>
> "I've been trying to look into your problem, but I just
> defended my thesis (unrelated to microscopy) so I've been
> otherwise swamped for the past several months. If the first
> several planes are bad but it's otherwise fine, then NVIDIA's FFT
> library likely has a bug; it's probably a fourier wrapping
> artefact. There might be a hardware bug in your card - I don't
> have one of those. Try adding a few slices of black to the top and
> bottom (i.e. pad to 36 slices). Keep in mind that having your card
> drive two large resolution monitors ties up quite a bit of that 6
> gb on its own."
>
> >From Manish:
>
> * The bug appears to be in NVidia's library, and may have been
> fixed in NVidia's CUDA Toolkit V6.5 - which was released this month.
> I looked on the Internet and found:
> http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf
> https://developer.nvidia.com/cuda-downloads
> Current NVidia graphics card drivers are at
> http://www.nvidia.com/Download/index.aspx
> ... No, I don't know if installing the newest driver will solve my
> problem (vs Marc Bruce recompiling and sending out updated
> deconvolution program for ImageJ) ... on my to do list for Friday.
>
> * Manish pointed out to me that NVidia's gamers cards lines are
> for gamers, not intended for scientific (or financial's - let's
> hope your credit cards do not involve the FFT bug code) use.
> NVidia charges more for the "pro" level cards - which the TITAN
> and TITAN Z are. I lobbied Manish to support two (or more) TITAN
> Z's, hopefully they can name it "ZZ Top Decon" ... and that this
> would fit and work inside Zeiss LSM 710/780/880, Leica SP5/SP8
> PC's, whatever high content screening systems are at M.D. Anderson
> or driving distance in Houston, Andy York and Hari Shroff's iSIM
> system, various dual view SPIM/DSLM systems (with Hari's isotropic
> fusion deconvolution implemented on the ZZ's) and my lab's
> microscope PC (especially 'headless' within MetaMorph 7.8).
>
> In turn, i started lobbying Manish to go beyond just Z to
> implement Adam Hoppe's 3DFSR.
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2426648/pdf/400.pdf
> See especially figures 5 and 6 (and 3DFSR is not just for
> fluorescence resonance energy transfer data).
> 61 lines of MatLab code
> http://sitemaker.umich.edu/4dimagingcenter/3dfsr
>
> Having managed a Zeiss LSM710, two Leica SP5's, now have access to
> a Leica SP8 (thanks to Tomasz Zal), and been dealing with
> MetaMorph since 1992 (with a couple of years off for spectral
> karyotyping & spectral pathology), the "spectral unmixing" we have
> access to in commercial systems -- I'm looking at you, Zeiss,
> Leica and MetaMorph (Molecular Devices) -- are lame. I estimate
> there are 3000 Zeiss and Leica confocal microscopes worldwide. All
> of them would benefit from the image quality improvements possible
> with headless, instant gratification ZZ Top 3DFSR.
>
> I also want to mention that deconvolution can be done on
> brightfield Z-series. Sokic et al recently published "WOSD" using
> an approach(es) Tim Holmes (a coauthor,
> http://www.lickenbrock.com/) and their colleagues started with
> Autoquant:
> 1. blind deconvolution ...
> http://wwwuser.gwdg.de/~uboehm/images/24.pdf
> <http://wwwuser.gwdg.de/%7Euboehm/images/24.pdf>
> 2. brightfield deconvolution by simply 'inverting' the contrast
> (brightest pixel intensity - data) then using their fluorescence
> blind decon.
> Sokic S, Larson JC, Larkin SM, Papavasiliou G, Holmes TJ, Brey EM.
> Label-free nondestructive imaging of vascular network structure in
> 3D culture. Microvasc Res. 2014 Mar;92:72-8. doi:
> 10.1016/j.mvr.2014.01.003. PubMed PMID: 24423617.
> http://www.ncbi.nlm.nih.gov/pubmed/24423617
> The Sokic discussion and supplemental file make several good
> points. The supplemental file recommends - and justifies (cites
> Holmes 2006):
> delta x = lambda / 4 NA
> delta Z = n * lambda / (NA)^2
>
> which I tabulate here for three objective lenses, and provide ratios:
>
> for 500 nm light
>
>
>
> 0.40 NA
>
>
>
> 0.75 NA
>
>
>
> 1.4 NA
>
> delta x,y
>
>
>
> 312 nm [300nm]
>
>
>
> 167 nm [~200 nm]
>
>
>
> 89 nm [90nm]
>
> delta z
>
>
>
> 3125 nm [3.0 um]
>
>
>
> 889 nm [900nm]
>
>
>
> 387 nm [400nm]
>
> dx/dz
>
>
>
> 0.0998
>
>
>
> 0.188
>
>
>
> 0.2299
>
> dz/dx
>
>
>
> 10.01
>
>
>
> 5.3
>
>
>
> 4.348
>
>
>
> //
>
> The 'inverted contrast' approach for brightfield deconvolution for
> non-absorbing is reasonable. For absorbing specimens, such as H&E,
> DAB&H, Trichrome -- see
> http://home.earthlink.net/~tiki_goddess/TikiGoddess.jpg
> <http://home.earthlink.net/%7Etiki_goddess/TikiGoddess.jpg> and
> http://works.bepress.com/gmcnamara/11/ -- the optical density of
> each channel needs to be computed ("Scaled Optical Density" in
> MetaMorph. A 'scatter only (or mostly)' channel can be acquired on
> most histology stained slides by using near infra-red light (ex.
> 700 nm from a Cy5/Alexa Fluor 647 emission filter). Not many
> 'histology' microscopes have Z-drives -- not a problem: use DAPI,
> GFP, and Texas Red bandpass emission filters. I have also acquired
> brightfield RGB images on my (ok, Miami's) confocal microscopes.
> the laser lines (ex. 405, 514, 633 nm) are not perfect matches to
> visual RGB, but are close enough thanks to the broadness of
> absorption spectrum of most histology slides (scattering in the
> case of the DAB polymer). Compared to the cost of most digital
> slide imagers -- I'm looking at you, Hamamatsu NanoZoomer (and
> others) -- $6K for "ZZ' GPU's (ok, and more for a good computer)
> and (hopefully reasonable price) B&B commercial GPU deonvolution
> software and tech support (and/or alternatives), could improve
> DSI's outputs significantly. In other words, ther are
> opportunities to improve image quality with deconvolution -- and
> 3DFSR -- in both fluorescence and histology/pathology.
>
> Best wishes and may all your photons be reassigned to the right
> place and channel,
>
> Sincerely,
>
> George
> p.s. The experiment I mentioned at the top had the unusual
> situation that the immunofluorescence needed 20 ms exposure times
> (SOLA lamp in single LED mode, narrow pass Semrock exciter
> filters, Leica or Semrock filter cubes, Leica plan apo 63x/1.4NA
> lens, FLASH4.0, MetaMorph, brightest pixels way over 10,000
> intensity value for both the Alexa Fluor 568 and the Alexa Fluor
> 647 immunostainings) ... while the Hoechst needed 200 ms to get a
> dim (under 5,000 intensity values) image. This is because I asked
> the user to use 100 ng/mL Hoechst, then one wash (mounting medium
> was PBS) instead of the usual high concentrations of DAPI or
> Hoechst. I am going to routinely use this low concentration and
> encourage my lab colleagues to use as well (I've always
> recommended against using the premade commercial dye+mounting
> media products).
> For another way to image with Hoechst, I encourage reading:
> http://www.ncbi.nlm.nih.gov/pubmed/24945151
> Szczurek AT, Prakash K, Lee HK, Zurek-Biesiada DJ, Best G, Hagmann
> M, Dobrucki JW, Cremer C, Birk U. Single molecule localization
> microscopy of the distribution of chromatin using Hoechst and DAPI
> fluorescent probes. Nucleus. 2014 Jun 19;5(4). [Epub ahead of
> print] PubMed PMID: 24945151.
>
>
> On 8/16/2014 9:46 AM, Lutz Schaefer wrote:
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in
> your posting.
> *****
>
> Sergey,
> PSF generation shouldn’t take that long. The COSMOS also uses
> the more accurate Gauss Kronrod integrator which itself can be
> computationally intensive, depending on integrand behavior
> (e.g. with SA present) and specified numerical accuracy. I am
> using the more simple integration using the roots of Gauss
> Legendre polynomials instead. This is significantly faster,
> even when using 200-800 data points, not to mention that it is
> easily to be parallelized. My implementation which is also
> used in the Zeiss ZEN deconvolution software, runs a few
> hundred milliseconds, depending on SA. Concerning accuracy, in
> collaboration with the C. Preza lab we have compared this
> method to the Gauss Kronrod and found no significant
> deviations on both Haeberle and G&L.
>
> On a second observation of your post, as the PSF support falls
> off more or less rapidly towards the border you do not need to
> generate near zero values using these expensive methods...
>
> Hope this helps a bit further
> Regards
> Lutz
>
> -----Original Message----- From: Brian Northan
> Sent: Friday, August 15, 2014 7:41 AM
> To: [hidden email]
> <mailto:[hidden email]>
> Subject: Re: GPU-based deconvolution
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in
> your posting.
> *****
>
> Hi Sergey
>
> COSMOS is a legacy software, but the only that
>
> supports PSF generation according to Haeberle and
> Török models.
> I'd like to
> speedup PSF generation, 1hr for 256*256*40 is too much.
>
>
> One hour is for sure too much. Did you try the Gibson Lanni
> model?? It
> should be faster. I used the COSMOS PSF generation code for
> another project
> and have a simple hack that makes it faster. Do you compile
> the code or
> just use their executables?? My copy generates a 256 by 256
> by 40 PSF in
> under 10 seconds.
>
> If compiling the code you can do the following. They use
> oversampling when
> generating PSFs. Under the main Cosm code tree there is a file
> /psf/psf/psf.cxx. There is a function in this file called
> 'evaluate'.
> The first line creates a 'radialPsf'. There is an additional
> optional
> variable you can pass to the 'radialPsf' constructor that
> specifies the
> oversampling factor (default value is 32). If you pass 2 the
> code will run
> faster and the result won't change too much.
>
> (You might want to test on a bead image or phantom sample to
> confirm this)
>
> Brian
>
>
> On Thu, Aug 14, 2014 at 8:57 AM, George McNamara
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
> *****
> To join, leave or search the confocal microscopy listserv,
> go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link
> in your posting.
> *****
>
> Hi Sergey,
>
> One great point Manish emphasized to me is that anytime we
> are using
> FFT's, to give the FFT routines data with power of two
> dimensions. So,
> instead of 256*256*40, you would be better off with
> 256*256*32, and might
> be better off with 256*256*64.
>
> If COSMOS is no longer being developed by Prof. Preza, all
> the more reason
> for you to adopt it! It is using a GNU General Public License,
> http://cirl.memphis.edu/cosmos_faqs.php
>
> Likewise the Clarity deconvolution library at UNC is no
> longer being
> developed, consider taking over that.
>
> Adam Hoppe told me several months ago that he is
> continuing to work on
> 3DFSR, so consider contacting him at
> http://www.sdstate.edu/chem/
> faculty/adam-hoppe/
>
> I forgot to mention in yesterday's email the
> BoaDeconvolution -- already
> Parallel Deconvolution -- at
> http://pawliczek.net.pl/deconvolution/
> http://onlinelibrary.wiley.com/doi/10.1002/jemt.20773/abstract
>
> I've previously posted to the listserv more on GPU (and
> Xeon, Xeon Phi,
> Knights Landing proposal).
>
> best wishes,
>
> George
>
>
>
> On 8/14/2014 4:14 AM, Sergey Tauger wrote:
>
> *****
> To join, leave or search the confocal microscopy
> listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the
> link in your posting.
> *****
>
> George,
>
> Thank you for useful links. I'm going to write my own
> deconvolution
> software or
> contribute to existing. At the moment I use
> DeconvolutionLab and COSMOS
> for
> PSF generation. To my regret, COSMOS is a legacy
> software, but the only
> that
> supports PSF generation according to Haeberle and
> Török models.
> I'd like to
> speedup PSF generation, 1hr for 256*256*40 is too much.
>
> I do not overestimate my programming skills, so I'd
> better use a existing
> GPU-
> based PSF generation software as a "best practice"
> guide than writing my
> own
> implementation from scratch. If I succeed in speeding
> up PSF generation,
> I'll post
> a link to github in this thread.
>
> Best,
> Sergey
>
>
>
>
>
> --
>
>
>
> George McNamara, Ph.D.
> Single Cells Analyst
> L.J.N. Cooper Lab
> University of Texas M.D. Anderson Cancer Center
> Houston, TX 77054
> Tattletales http://works.bepress.com/gmcnamara/42
>
>
>
>
> --
>
>
>
> George McNamara, Ph.D.
> Single Cells Analyst
> L.J.N. Cooper Lab
> University of Texas M.D. Anderson Cancer Center
> Houston, TX 77054
> Tattletales http://works.bepress.com/gmcnamara/42
>
>

--

George McNamara, Ph.D.
Single Cells Analyst
L.J.N. Cooper Lab
University of Texas M.D. Anderson Cancer Center
Houston, TX 77054
Tattletales http://works.bepress.com/gmcnamara/42