Re: GPU-based deconvolution

Posted by Brian Northan on
URL: http://confocal-microscopy-list.275.s1.nabble.com/GPU-based-deconvolution-tp7582498p7582557.html

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi George

  "I checked again, and my hunch was correct: it's a fourier transform
> artefact. If you want to get around that from occurring, pad to 36 slices
> with black. Powers of 2, 3, 5, 7 (and combinations of those) remain speedy
> without taking up too much extra ram. I didn't have any memory metrics
> reported in that ImageJ plugin--too many synchronization points from java
> to the gpu."
>

1.  FFT libraries often come with a utility that returns the "nearest" fast
size for padding but last I used Cuda NVIDIA did not have this.  Older
versions of cufft were much faster at powers of 2 as compared to fftw.  By
this I mean cufft was ~10x faster at power 2, but only slightly faster at
non-power of 2.  This may have been improved in the latest version of CUDA.

2.  It would be important to do a very rigorous comparison of Bruce and
Butt results to other algorithms.  I know their algorithm is based on RL
but I'm not sure what type of regularization and acceleration their
algorithm uses, this could change the results quite a bit.  Would be
interesting to see a comparision to  EPFL (Big Lab) algorithms, Huygens,
Cell Sens, Zeiss, Autoquant.

3.  Padding is tricky.  Padding by zero can lead to artifacts as well.  So
a more subtle strategy to deal with edge artifacts is needed.   EPFL lab
ran the last "deconvolution challenge" and recomended using non-circulant
convolution model (
http://bigwww.epfl.ch/deconvolution/challenge/index.html?p=documentation/theory/forwardmodel,
Ask for their Matlab code as it makes it clearer what they are doing).
However their strategy can lead to large image buffers and reduced speed
(depending on the extent of the PSF).  But there are other strategies to
deal with edges.  David Biggs has a section about it in his thesis (
https://researchspace.auckland.ac.nz/handle/2292/1760) (15+ years old, but
a lot of good information)

4. There is an open version of GPU deconvolution by Bob Pepin.
https://github.com/bobpepin/YacuDecu.  I believe it is the standard
Richardson Lucy, but could be a good starting point for something else.
There are wrappers for Matlab and Imaris.

Brian






On Thu, Aug 21, 2014 at 10:27 PM, George McNamara <[hidden email]
> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Dear Confocal Listserv,
>
> With respect to my previous post about Bruce & Butte's NVidia CUDA GPU
> deconvolution bug, I exchanged emailes with Dr. Marc Bruce and Prof. Manish
> Butte, and spoke with Manish today. Here's what I learned.
>
> There is a workaround: Padding the acquisition with several extra image
> planes at the beginning does work - per Marc Bruce's suggestion below. In
> fact, on Wednesday I acquired a 32 plane Z-series of a novel specimen, in
> which I started even further in the coverglass than usual (63x, 1.4NA lens,
> 158 nm XY pixel size, 300 nm Z-steps, 20 ms exposure time for the Alexa
> Fluor 568 immunostaining). B&B's deconvolution took under 5 seconds (each
> channel) for 1024x1024 pixels, 32 planes. TITAN Z The first 8 planes
> exhibited the bug, I deleted them once I had the data back in MetaMorph.
> The planes with the cell data was in one word: spectacular. To give credit
> where credit is due: my colleague whose experiment we were imaging had
> 'nailed the landing' on experiment design and implementation (I was just
> the microphotographer). It would be great if GPU (and/or Xeron/Xeon
> Phi/Knights Landing, see earlier posts) could give us "instant
> gratification" clean-up of our confocal and widefield images (more on this
> later).
>
> Both Marc (message below) and Manish pointed out that for the CUDA GPU
> deconvolution (and maybe FFT's in general???),
> "Powers of 2, 3, 5, 7 (and combinations of those) remain speedy" - which I
> was surprised at, even though I've been dealing with FFT's for a long time
> since converting (Fortran to Turbo Pascal) and extended (mean harmonics of
> several shapes then inverse FFT with select harmonics) Rohlf & Archie's
> Elliptical Fourier Analysis of shapes, http://sysbio.oxfordjournals.
> org/content/33/3/302
> This is the basis for the EFA available in MetaMorph's Integrated
> Morphometry Analysis ("IMA)" / Morphometry module (turn on the option in
> Preferences to get access to the parameters).
>
> Quoting Marc (newer message in first paragraph):
>
>     "I checked again, and my hunch was correct: it's a fourier transform
> artefact. If you want to get around that from occurring, pad to 36 slices
> with black. Powers of 2, 3, 5, 7 (and combinations of those) remain speedy
> without taking up too much extra ram. I didn't have any memory metrics
> reported in that ImageJ plugin--too many synchronization points from java
> to the gpu."
>
>     "I've been trying to look into your problem, but I just defended my
> thesis (unrelated to microscopy) so I've been otherwise swamped for the
> past several months. If the first several planes are bad but it's otherwise
> fine, then NVIDIA's FFT library likely has a bug; it's probably a fourier
> wrapping artefact. There might be a hardware bug in your card - I don't
> have one of those. Try adding a few slices of black to the top and bottom
> (i.e. pad to 36 slices). Keep in mind that having your card drive two large
> resolution monitors ties up quite a bit of that 6 gb on its own."
>
> From Manish:
>
> * The bug appears to be in NVidia's library, and may have been fixed in
> NVidia's CUDA Toolkit V6.5 - which was released this month.
> I looked on the Internet and found:
> http://developer.download.nvidia.com/compute/cuda/6_5/
> rel/docs/CUDA_Toolkit_Release_Notes.pdf
> https://developer.nvidia.com/cuda-downloads
> Current NVidia graphics card drivers are at
> http://www.nvidia.com/Download/index.aspx
> ... No, I don't know if installing the newest driver will solve my problem
> (vs Marc Bruce recompiling and sending out updated deconvolution program
> for ImageJ) ... on my to do list for Friday.
>
> * Manish pointed out to me that NVidia's gamers cards lines are for
> gamers, not intended for scientific (or financial's - let's hope your
> credit cards do not involve the FFT bug code) use. NVidia charges more for
> the "pro" level cards - which the TITAN and TITAN Z are. I lobbied Manish
> to support two (or more) TITAN Z's, hopefully they can name it "ZZ Top
> Decon" ... and that this would fit and work inside Zeiss LSM 710/780/880,
> Leica SP5/SP8 PC's, whatever high content screening systems are at M.D.
> Anderson or driving distance in Houston, Andy York and Hari Shroff's iSIM
> system, various dual view SPIM/DSLM systems (with Hari's isotropic fusion
> deconvolution implemented on the ZZ's) and my lab's microscope PC
> (especially 'headless' within MetaMorph 7.8).
>
> In turn, i started lobbying Manish to go beyond just Z to implement Adam
> Hoppe's 3DFSR.
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2426648/pdf/400.pdf
> See especially figures 5 and 6 (and 3DFSR is not just for fluorescence
> resonance energy transfer data).
> 61 lines of MatLab code
> http://sitemaker.umich.edu/4dimagingcenter/3dfsr
>
> Having managed a Zeiss LSM710, two Leica SP5's, now have access to a Leica
> SP8 (thanks to Tomasz Zal), and been dealing with MetaMorph since 1992
> (with a couple of years off for spectral karyotyping & spectral pathology),
> the "spectral unmixing" we have access to in commercial systems -- I'm
> looking at you, Zeiss, Leica and MetaMorph (Molecular Devices) -- are lame.
> I estimate there are 3000 Zeiss and Leica confocal microscopes worldwide.
> All of them would benefit from the image quality improvements possible with
> headless, instant gratification ZZ Top 3DFSR.
>
> I also want to mention that deconvolution can be done on brightfield
> Z-series. Sokic et al recently published "WOSD" using an approach(es) Tim
> Holmes (a coauthor, http://www.lickenbrock.com/) and their colleagues
> started with Autoquant:
> 1. blind deconvolution ... http://wwwuser.gwdg.de/~uboehm/images/24.pdf
> 2. brightfield deconvolution by simply 'inverting' the contrast (brightest
> pixel intensity - data) then using their fluorescence blind decon.
> Sokic S, Larson JC, Larkin SM, Papavasiliou G, Holmes TJ, Brey EM.
> Label-free nondestructive imaging of vascular network structure in 3D
> culture. Microvasc Res. 2014 Mar;92:72-8. doi: 10.1016/j.mvr.2014.01.003.
> PubMed PMID: 24423617.
> http://www.ncbi.nlm.nih.gov/pubmed/24423617
> The Sokic discussion and supplemental file make several good points. The
> supplemental file recommends - and justifies (cites Holmes 2006):
> delta x = lambda / 4 NA
> delta Z = n * lambda / (NA)^2
>
> which I tabulate here for three objective lenses, and provide ratios:
>
> for 500 nm light
>
>
>
> 0.40 NA
>
>
>
> 0.75 NA
>
>
>
> 1.4 NA
>
> delta x,y
>
>
>
>   312 nm  [300nm]
>
>
>
> 167 nm [~200 nm]
>
>
>
>   89 nm   [90nm]
>
> delta z
>
>
>
> 3125 nm  [3.0 um]
>
>
>
> 889 nm [900nm]
>
>
>
> 387 nm [400nm]
>
> dx/dz
>
>
>
> 0.0998
>
>
>
> 0.188
>
>
>
> 0.2299
>
> dz/dx
>
>
>
> 10.01
>
>
>
> 5.3
>
>
>
> 4.348
>
>
>
> //
>
> The 'inverted contrast' approach for brightfield deconvolution for
> non-absorbing is reasonable. For absorbing specimens, such as H&E, DAB&H,
> Trichrome -- see http://home.earthlink.net/~tiki_goddess/TikiGoddess.jpg
> and http://works.bepress.com/gmcnamara/11/ -- the optical density of each
> channel needs to be computed ("Scaled Optical Density" in MetaMorph. A
> 'scatter only (or mostly)' channel can be acquired on most histology
> stained slides by using near infra-red light (ex. 700 nm from a Cy5/Alexa
> Fluor 647 emission filter). Not many 'histology' microscopes have Z-drives
> -- not a problem: use DAPI, GFP, and Texas Red bandpass emission filters. I
> have also acquired brightfield RGB images on my (ok, Miami's) confocal
> microscopes. the laser lines (ex. 405, 514, 633 nm) are not perfect matches
> to visual RGB, but are close enough thanks to the broadness of absorption
> spectrum of most histology slides (scattering in the case of the DAB
> polymer). Compared to the cost of most digital slide imagers -- I'm looking
> at you, Hamamatsu NanoZoomer (and others) -- $6K for "ZZ' GPU's (ok, and
> more for a good computer) and (hopefully reasonable price) B&B commercial
> GPU deonvolution software and tech support (and/or alternatives), could
> improve DSI's outputs significantly. In other words, ther are opportunities
> to improve image quality with deconvolution -- and 3DFSR -- in both
> fluorescence and histology/pathology.
>
> Best wishes and may all your photons be reassigned to the right place and
> channel,
>
> Sincerely,
>
> George
> p.s. The experiment I mentioned at the top had the unusual situation  that
> the immunofluorescence needed 20 ms exposure times (SOLA lamp in single LED
> mode, narrow pass Semrock exciter filters, Leica or Semrock filter cubes,
> Leica plan apo 63x/1.4NA lens, FLASH4.0, MetaMorph, brightest pixels way
> over 10,000 intensity value for both the Alexa Fluor 568 and the Alexa
> Fluor 647 immunostainings) ... while the Hoechst needed 200 ms to get a dim
> (under 5,000 intensity values) image. This is because I asked the user to
> use 100 ng/mL Hoechst, then one wash (mounting medium was PBS) instead of
> the usual high concentrations of DAPI or Hoechst. I am going to routinely
> use this low concentration and encourage my lab colleagues to use as well
> (I've always recommended against using the premade commercial dye+mounting
> media products).
> For another way to image with Hoechst, I encourage reading:
> http://www.ncbi.nlm.nih.gov/pubmed/24945151
> Szczurek AT, Prakash K, Lee HK, Zurek-Biesiada DJ, Best G, Hagmann M,
> Dobrucki JW, Cremer C, Birk U. Single molecule localization microscopy of
> the distribution of chromatin using Hoechst and DAPI fluorescent probes.
> Nucleus. 2014 Jun 19;5(4). [Epub ahead of print] PubMed PMID: 24945151.
>
>
> On 8/16/2014 9:46 AM, Lutz Schaefer wrote:
>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Sergey,
>> PSF generation shouldn’t take that long. The COSMOS also uses the more
>> accurate Gauss Kronrod integrator which itself can be computationally
>> intensive, depending on integrand behavior (e.g. with SA present) and
>> specified numerical accuracy. I am using the more simple integration using
>> the roots of Gauss Legendre polynomials instead. This is significantly
>> faster, even when using 200-800 data points, not to mention that it is
>> easily to be parallelized. My implementation which is also used in the
>> Zeiss ZEN deconvolution software, runs a few hundred milliseconds,
>> depending on SA. Concerning accuracy, in collaboration with the C. Preza
>> lab we have compared this method to the Gauss Kronrod and found no
>> significant deviations on both Haeberle and G&L.
>>
>> On a second observation of your post, as the PSF support falls off more
>> or less rapidly towards the border you do not need to generate near zero
>> values using these expensive methods...
>>
>> Hope this helps a bit further
>> Regards
>> Lutz
>>
>> -----Original Message----- From: Brian Northan
>> Sent: Friday, August 15, 2014 7:41 AM
>> To: [hidden email]
>> Subject: Re: GPU-based deconvolution
>>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your posting.
>> *****
>>
>> Hi Sergey
>>
>> COSMOS is a legacy software, but the only that
>>
>>> supports PSF generation according to Haeberle and T&#246;r&#246;k models.
>>> I'd like to
>>> speedup PSF generation, 1hr for 256*256*40 is too much.
>>>
>>>
>> One hour is for sure too much.  Did you try the Gibson Lanni model??  It
>> should be faster. I used the COSMOS PSF generation code for another
>> project
>> and have a simple hack that makes it faster.   Do you compile the code or
>> just use their executables??  My copy generates a 256 by 256 by 40 PSF in
>> under 10 seconds.
>>
>> If compiling the code you can do the following.  They use oversampling
>> when
>> generating PSFs.   Under the main Cosm code tree there is a file
>> /psf/psf/psf.cxx.   There is a function in this file called 'evaluate'.
>> The first line creates a 'radialPsf'.   There is an additional optional
>> variable you can pass to the 'radialPsf' constructor that specifies the
>> oversampling factor (default value is 32).  If you pass 2 the code will
>> run
>> faster and the result won't change too much.
>>
>> (You might want to test on a bead image or phantom sample to confirm this)
>>
>> Brian
>>
>>
>> On Thu, Aug 14, 2014 at 8:57 AM, George McNamara <
>> [hidden email]>
>> wrote:
>>
>>  *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your
>>> posting.
>>> *****
>>>
>>> Hi Sergey,
>>>
>>> One great point Manish emphasized to me is that anytime we are using
>>> FFT's, to give the FFT routines data with power of two dimensions. So,
>>> instead of 256*256*40, you would be better off with 256*256*32, and might
>>> be better off with 256*256*64.
>>>
>>> If COSMOS is no longer being developed by Prof. Preza, all the more
>>> reason
>>> for you to adopt it! It is using a GNU General Public License,
>>> http://cirl.memphis.edu/cosmos_faqs.php
>>>
>>> Likewise the Clarity deconvolution library at UNC is no longer being
>>> developed, consider taking over that.
>>>
>>> Adam Hoppe told me several months ago that he is continuing to work on
>>> 3DFSR, so consider contacting him at http://www.sdstate.edu/chem/
>>> faculty/adam-hoppe/
>>>
>>> I forgot to mention in yesterday's email the BoaDeconvolution -- already
>>> Parallel Deconvolution --  at
>>> http://pawliczek.net.pl/deconvolution/
>>> http://onlinelibrary.wiley.com/doi/10.1002/jemt.20773/abstract
>>>
>>> I've previously posted to the listserv more on GPU (and Xeon, Xeon Phi,
>>> Knights Landing proposal).
>>>
>>> best wishes,
>>>
>>> George
>>>
>>>
>>>
>>> On 8/14/2014 4:14 AM, Sergey Tauger wrote:
>>>
>>>  *****
>>>> To join, leave or search the confocal microscopy listserv, go to:
>>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>>> Post images on http://www.imgur.com and include the link in your
>>>> posting.
>>>> *****
>>>>
>>>> George,
>>>>
>>>> Thank you for useful links. I'm going to write my own deconvolution
>>>> software or
>>>> contribute to existing. At the moment I use DeconvolutionLab and COSMOS
>>>> for
>>>> PSF generation. To my regret,  COSMOS is a legacy software, but the only
>>>> that
>>>> supports PSF generation according to Haeberle and T&#246;r&#246;k
>>>> models.
>>>> I'd like to
>>>> speedup PSF generation, 1hr for 256*256*40 is too much.
>>>>
>>>> I do not overestimate my programming skills, so I'd better use a
>>>> existing
>>>> GPU-
>>>> based PSF generation software as a "best practice" guide than writing my
>>>> own
>>>> implementation from scratch. If I succeed in speeding up PSF generation,
>>>> I'll post
>>>> a link to github in this thread.
>>>>
>>>> Best,
>>>> Sergey
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>> George McNamara, Ph.D.
>>> Single Cells Analyst
>>> L.J.N. Cooper Lab
>>> University of Texas M.D. Anderson Cancer Center
>>> Houston, TX 77054
>>> Tattletales http://works.bepress.com/gmcnamara/42
>>>
>>>
>>
>
> --
>
>
>
> George McNamara, Ph.D.
> Single Cells Analyst
> L.J.N. Cooper Lab
> University of Texas M.D. Anderson Cancer Center
> Houston, TX 77054
> Tattletales http://works.bepress.com/gmcnamara/42
>