Re: GPU-based deconvolution

Posted by George McNamara on
URL: http://confocal-microscopy-list.275.s1.nabble.com/GPU-based-deconvolution-tp7582498p7582553.html

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Dear Confocal Listserv,

With respect to my previous post about Bruce & Butte's NVidia CUDA GPU
deconvolution bug, I exchanged emailes with Dr. Marc Bruce and Prof.
Manish Butte, and spoke with Manish today. Here's what I learned.

There is a workaround: Padding the acquisition with several extra image
planes at the beginning does work - per Marc Bruce's suggestion below.
In fact, on Wednesday I acquired a 32 plane Z-series of a novel
specimen, in which I started even further in the coverglass than usual
(63x, 1.4NA lens, 158 nm XY pixel size, 300 nm Z-steps, 20 ms exposure
time for the Alexa Fluor 568 immunostaining). B&B's deconvolution took
under 5 seconds (each channel) for 1024x1024 pixels, 32 planes. TITAN Z
The first 8 planes exhibited the bug, I deleted them once I had the data
back in MetaMorph. The planes with the cell data was in one word:
spectacular. To give credit where credit is due: my colleague whose
experiment we were imaging had 'nailed the landing' on experiment design
and implementation (I was just the microphotographer). It would be great
if GPU (and/or Xeron/Xeon Phi/Knights Landing, see earlier posts) could
give us "instant gratification" clean-up of our confocal and widefield
images (more on this later).

Both Marc (message below) and Manish pointed out that for the CUDA GPU
deconvolution (and maybe FFT's in general???),
"Powers of 2, 3, 5, 7 (and combinations of those) remain speedy" - which
I was surprised at, even though I've been dealing with FFT's for a long
time since converting (Fortran to Turbo Pascal) and extended (mean
harmonics of several shapes then inverse FFT with select harmonics)
Rohlf & Archie's Elliptical Fourier Analysis of shapes,
http://sysbio.oxfordjournals.org/content/33/3/302
This is the basis for the EFA available in MetaMorph's Integrated
Morphometry Analysis ("IMA)" / Morphometry module (turn on the option in
Preferences to get access to the parameters).

Quoting Marc (newer message in first paragraph):

     "I checked again, and my hunch was correct: it's a fourier
transform artefact. If you want to get around that from occurring, pad
to 36 slices with black. Powers of 2, 3, 5, 7 (and combinations of
those) remain speedy without taking up too much extra ram. I didn't have
any memory metrics reported in that ImageJ plugin--too many
synchronization points from java to the gpu."

     "I've been trying to look into your problem, but I just defended my
thesis (unrelated to microscopy) so I've been otherwise swamped for the
past several months. If the first several planes are bad but it's
otherwise fine, then NVIDIA's FFT library likely has a bug; it's
probably a fourier wrapping artefact. There might be a hardware bug in
your card - I don't have one of those. Try adding a few slices of black
to the top and bottom (i.e. pad to 36 slices). Keep in mind that having
your card drive two large resolution monitors ties up quite a bit of
that 6 gb on its own."

 From Manish:

* The bug appears to be in NVidia's library, and may have been fixed in
NVidia's CUDA Toolkit V6.5 - which was released this month.
I looked on the Internet and found:
http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf
https://developer.nvidia.com/cuda-downloads
Current NVidia graphics card drivers are at
http://www.nvidia.com/Download/index.aspx
... No, I don't know if installing the newest driver will solve my
problem (vs Marc Bruce recompiling and sending out updated deconvolution
program for ImageJ) ... on my to do list for Friday.

* Manish pointed out to me that NVidia's gamers cards lines are for
gamers, not intended for scientific (or financial's - let's hope your
credit cards do not involve the FFT bug code) use. NVidia charges more
for the "pro" level cards - which the TITAN and TITAN Z are. I lobbied
Manish to support two (or more) TITAN Z's, hopefully they can name it
"ZZ Top Decon" ... and that this would fit and work inside Zeiss LSM
710/780/880, Leica SP5/SP8 PC's, whatever high content screening systems
are at M.D. Anderson or driving distance in Houston, Andy York and Hari
Shroff's iSIM system, various dual view SPIM/DSLM systems (with Hari's
isotropic fusion deconvolution implemented on the ZZ's) and my lab's
microscope PC (especially 'headless' within MetaMorph 7.8).

In turn, i started lobbying Manish to go beyond just Z to implement Adam
Hoppe's 3DFSR.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2426648/pdf/400.pdf
See especially figures 5 and 6 (and 3DFSR is not just for fluorescence
resonance energy transfer data).
61 lines of MatLab code
http://sitemaker.umich.edu/4dimagingcenter/3dfsr

Having managed a Zeiss LSM710, two Leica SP5's, now have access to a
Leica SP8 (thanks to Tomasz Zal), and been dealing with MetaMorph since
1992 (with a couple of years off for spectral karyotyping & spectral
pathology), the "spectral unmixing" we have access to in commercial
systems -- I'm looking at you, Zeiss, Leica and MetaMorph (Molecular
Devices) -- are lame. I estimate there are 3000 Zeiss and Leica confocal
microscopes worldwide. All of them would benefit from the image quality
improvements possible with headless, instant gratification ZZ Top 3DFSR.

I also want to mention that deconvolution can be done on brightfield
Z-series. Sokic et al recently published "WOSD" using an approach(es)
Tim Holmes (a coauthor, http://www.lickenbrock.com/) and their
colleagues started with Autoquant:
1. blind deconvolution ... http://wwwuser.gwdg.de/~uboehm/images/24.pdf
2. brightfield deconvolution by simply 'inverting' the contrast
(brightest pixel intensity - data) then using their fluorescence blind
decon.
Sokic S, Larson JC, Larkin SM, Papavasiliou G, Holmes TJ, Brey EM.
Label-free nondestructive imaging of vascular network structure in 3D
culture. Microvasc Res. 2014 Mar;92:72-8. doi:
10.1016/j.mvr.2014.01.003. PubMed PMID: 24423617.
http://www.ncbi.nlm.nih.gov/pubmed/24423617
The Sokic discussion and supplemental file make several good points. The
supplemental file recommends - and justifies (cites Holmes 2006):
delta x = lambda / 4 NA
delta Z = n * lambda / (NA)^2

which I tabulate here for three objective lenses, and provide ratios:

for 500 nm light

       

0.40 NA

       

0.75 NA

       

1.4 NA

delta x,y

       

   312 nm  [300nm]

       

167 nm [~200 nm]

       

   89 nm   [90nm]

delta z

       

3125 nm  [3.0 um]

       

889 nm [900nm]

       

387 nm [400nm]

dx/dz

       

0.0998

       

0.188

       

0.2299

dz/dx

       

10.01

       

5.3

       

4.348



//

The 'inverted contrast' approach for brightfield deconvolution for
non-absorbing is reasonable. For absorbing specimens, such as H&E,
DAB&H, Trichrome -- see
http://home.earthlink.net/~tiki_goddess/TikiGoddess.jpg and
http://works.bepress.com/gmcnamara/11/ -- the optical density of each
channel needs to be computed ("Scaled Optical Density" in MetaMorph. A
'scatter only (or mostly)' channel can be acquired on most histology
stained slides by using near infra-red light (ex. 700 nm from a
Cy5/Alexa Fluor 647 emission filter). Not many 'histology' microscopes
have Z-drives -- not a problem: use DAPI, GFP, and Texas Red bandpass
emission filters. I have also acquired brightfield RGB images on my (ok,
Miami's) confocal microscopes. the laser lines (ex. 405, 514, 633 nm)
are not perfect matches to visual RGB, but are close enough thanks to
the broadness of absorption spectrum of most histology slides
(scattering in the case of the DAB polymer). Compared to the cost of
most digital slide imagers -- I'm looking at you, Hamamatsu NanoZoomer
(and others) -- $6K for "ZZ' GPU's (ok, and more for a good computer)
and (hopefully reasonable price) B&B commercial GPU deonvolution
software and tech support (and/or alternatives), could improve DSI's
outputs significantly. In other words, ther are opportunities to improve
image quality with deconvolution -- and 3DFSR -- in both fluorescence
and histology/pathology.

Best wishes and may all your photons be reassigned to the right place
and channel,

Sincerely,

George
p.s. The experiment I mentioned at the top had the unusual situation  
that the immunofluorescence needed 20 ms exposure times (SOLA lamp in
single LED mode, narrow pass Semrock exciter filters, Leica or Semrock
filter cubes, Leica plan apo 63x/1.4NA lens, FLASH4.0, MetaMorph,
brightest pixels way over 10,000 intensity value for both the Alexa
Fluor 568 and the Alexa Fluor 647 immunostainings) ... while the Hoechst
needed 200 ms to get a dim (under 5,000 intensity values) image. This is
because I asked the user to use 100 ng/mL Hoechst, then one wash
(mounting medium was PBS) instead of the usual high concentrations of
DAPI or Hoechst. I am going to routinely use this low concentration and
encourage my lab colleagues to use as well (I've always recommended
against using the premade commercial dye+mounting media products).
For another way to image with Hoechst, I encourage reading:
http://www.ncbi.nlm.nih.gov/pubmed/24945151
Szczurek AT, Prakash K, Lee HK, Zurek-Biesiada DJ, Best G, Hagmann M,
Dobrucki JW, Cremer C, Birk U. Single molecule localization microscopy
of the distribution of chromatin using Hoechst and DAPI fluorescent
probes. Nucleus. 2014 Jun 19;5(4). [Epub ahead of print] PubMed PMID:
24945151.

On 8/16/2014 9:46 AM, Lutz Schaefer wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Sergey,
> PSF generation shouldn’t take that long. The COSMOS also uses the more
> accurate Gauss Kronrod integrator which itself can be computationally
> intensive, depending on integrand behavior (e.g. with SA present) and
> specified numerical accuracy. I am using the more simple integration
> using the roots of Gauss Legendre polynomials instead. This is
> significantly faster, even when using 200-800 data points, not to
> mention that it is easily to be parallelized. My implementation which
> is also used in the Zeiss ZEN deconvolution software, runs a few
> hundred milliseconds, depending on SA. Concerning accuracy, in
> collaboration with the C. Preza lab we have compared this method to
> the Gauss Kronrod and found no significant deviations on both Haeberle
> and G&L.
>
> On a second observation of your post, as the PSF support falls off
> more or less rapidly towards the border you do not need to generate
> near zero values using these expensive methods...
>
> Hope this helps a bit further
> Regards
> Lutz
>
> -----Original Message----- From: Brian Northan
> Sent: Friday, August 15, 2014 7:41 AM
> To: [hidden email]
> Subject: Re: GPU-based deconvolution
>
> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hi Sergey
>
> COSMOS is a legacy software, but the only that
>> supports PSF generation according to Haeberle and Török
>> models.
>> I'd like to
>> speedup PSF generation, 1hr for 256*256*40 is too much.
>>
>
> One hour is for sure too much.  Did you try the Gibson Lanni model??  It
> should be faster. I used the COSMOS PSF generation code for another
> project
> and have a simple hack that makes it faster.   Do you compile the code or
> just use their executables??  My copy generates a 256 by 256 by 40 PSF in
> under 10 seconds.
>
> If compiling the code you can do the following.  They use oversampling
> when
> generating PSFs.   Under the main Cosm code tree there is a file
> /psf/psf/psf.cxx.   There is a function in this file called 'evaluate'.
> The first line creates a 'radialPsf'.   There is an additional optional
> variable you can pass to the 'radialPsf' constructor that specifies the
> oversampling factor (default value is 32).  If you pass 2 the code
> will run
> faster and the result won't change too much.
>
> (You might want to test on a bead image or phantom sample to confirm
> this)
>
> Brian
>
>
> On Thu, Aug 14, 2014 at 8:57 AM, George McNamara
> <[hidden email]>
> wrote:
>
>> *****
>> To join, leave or search the confocal microscopy listserv, go to:
>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>> Post images on http://www.imgur.com and include the link in your
>> posting.
>> *****
>>
>> Hi Sergey,
>>
>> One great point Manish emphasized to me is that anytime we are using
>> FFT's, to give the FFT routines data with power of two dimensions. So,
>> instead of 256*256*40, you would be better off with 256*256*32, and
>> might
>> be better off with 256*256*64.
>>
>> If COSMOS is no longer being developed by Prof. Preza, all the more
>> reason
>> for you to adopt it! It is using a GNU General Public License,
>> http://cirl.memphis.edu/cosmos_faqs.php
>>
>> Likewise the Clarity deconvolution library at UNC is no longer being
>> developed, consider taking over that.
>>
>> Adam Hoppe told me several months ago that he is continuing to work on
>> 3DFSR, so consider contacting him at http://www.sdstate.edu/chem/
>> faculty/adam-hoppe/
>>
>> I forgot to mention in yesterday's email the BoaDeconvolution -- already
>> Parallel Deconvolution --  at
>> http://pawliczek.net.pl/deconvolution/
>> http://onlinelibrary.wiley.com/doi/10.1002/jemt.20773/abstract
>>
>> I've previously posted to the listserv more on GPU (and Xeon, Xeon Phi,
>> Knights Landing proposal).
>>
>> best wishes,
>>
>> George
>>
>>
>>
>> On 8/14/2014 4:14 AM, Sergey Tauger wrote:
>>
>>> *****
>>> To join, leave or search the confocal microscopy listserv, go to:
>>> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
>>> Post images on http://www.imgur.com and include the link in your
>>> posting.
>>> *****
>>>
>>> George,
>>>
>>> Thank you for useful links. I'm going to write my own deconvolution
>>> software or
>>> contribute to existing. At the moment I use DeconvolutionLab and COSMOS
>>> for
>>> PSF generation. To my regret,  COSMOS is a legacy software, but the
>>> only
>>> that
>>> supports PSF generation according to Haeberle and T&#246;r&#246;k
>>> models.
>>> I'd like to
>>> speedup PSF generation, 1hr for 256*256*40 is too much.
>>>
>>> I do not overestimate my programming skills, so I'd better use a
>>> existing
>>> GPU-
>>> based PSF generation software as a "best practice" guide than
>>> writing my
>>> own
>>> implementation from scratch. If I succeed in speeding up PSF
>>> generation,
>>> I'll post
>>> a link to github in this thread.
>>>
>>> Best,
>>> Sergey
>>>
>>>
>>>
>>
>>
>> --
>>
>>
>>
>> George McNamara, Ph.D.
>> Single Cells Analyst
>> L.J.N. Cooper Lab
>> University of Texas M.D. Anderson Cancer Center
>> Houston, TX 77054
>> Tattletales http://works.bepress.com/gmcnamara/42
>>
>


--



George McNamara, Ph.D.
Single Cells Analyst
L.J.N. Cooper Lab
University of Texas M.D. Anderson Cancer Center
Houston, TX 77054
Tattletales http://works.bepress.com/gmcnamara/42