Data storage - NAS servers

classic Classic list List threaded Threaded
12 messages Options
Adam Glaser Adam Glaser
Reply | Threaded
Open this post in threaded view
|

Data storage - NAS servers

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hello everyone,

We are currently exploring in lab data storage options for storing the large amounts of data our light-sheet microscopes are producing (while simultaneously providing shared access from multiple PCs for processing and/or visualization).  Given these needs, we have tentatively arrived at purchasing a NAS server.  I was wondering if anyone had suggestions or recommendations on the best companies/options available.  In preliminary searching I have seen FreeNAS pop up several times, but am quite a novice when it comes to data storage - so any input from the listserv would be greatly appreciated!

Thanks!
Adam
Michael Giacomelli Michael Giacomelli
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Adam,

We use 12 bay Synology NAS devices.  Currently the RS2416RP+ model.
Each rack mount device can further support an expansion bay, for a
total of 24 disks per device.  Cost for the base device is about 2300,
then when that fills up, another 1200 for an expansion bay.  That
works out to be about 3500 worth of server per 24 disks, or $150 of
server per disk.

Setup and management is easy, the software automatically updates
itself, emails you if there is a problem or disk failure.  We run two
disk redundancy per 12 disk set.  Performance is acceptable for what
we do (dump a few hundred GB of images at a time into long term
storage) but you could do better if you needed, the CPUs are quite
slow.

We have bought them from SimplyNAS before.  They are ok, although they
did mess up one order in the past and we had to exchange the hardware
they sent, but we like that they will bundle the disks in with the
unit, which is helpful for us for accounting reasons.  There are many
other vendors (Amazon, etc). BackBlaze publishes disk reliability, so
we typically buy what they say is reliable:
https://www.backblaze.com/b2/hard-drive-test-data.html

You can do cheaper using PC hardware and storage optimized linux
distros, but we typically run the racks for ~7 years before retiring
them, which means you need to make sure someone will be around that
knows how to rebuild a RAID array when a disk fails, which happens
from time to time.  We were concerned about this since students leave
over time and we don't necessarily have someone available any given
year who knows older linux systems.

Mike

On Tue, May 8, 2018 at 4:41 PM, Adam Glaser <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hello everyone,
>
> We are currently exploring in lab data storage options for storing the large amounts of data our light-sheet microscopes are producing (while simultaneously providing shared access from multiple PCs for processing and/or visualization).  Given these needs, we have tentatively arrived at purchasing a NAS server.  I was wondering if anyone had suggestions or recommendations on the best companies/options available.  In preliminary searching I have seen FreeNAS pop up several times, but am quite a novice when it comes to data storage - so any input from the listserv would be greatly appreciated!
>
> Thanks!
> Adam
Mel Symeonides Mel Symeonides
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Adam,

(warning: long email ahead)

For our iSPIM we have the 8-bay Synology DS1817+ (about $900-$1000)
loaded with 8 x Western Digital Gold WD101KRYZ 10 TB hard drives (about
$400 each), and set up in SHR-2 mode (which is basically RAID-6, i.e.
2-disk redundancy) which leaves about 60 TB of usable space. I also
installed an Intel X520-DA2 10GbE adapter as the integrated Gigabit
Ethernet connection would bottleneck the speed of the drive array, I was
able to find one of those for $150 but it was a refurb. You'll also need
a 10GbE adapter for your PC and the appropriate SFP+ cable(s).

Once this fills up, and we have exhausted all possibilities of
compressing or discarding data, I think we will just buy another one as
opposed to buying an expansion bay for this one, as the expansion bay
runs on the eSATA connector and is therefore limited in speed, so you
get no speed gains from any RAID array that's in the expansion bay.

Just be aware that, with regular SATA 7.2K pm HDDs (i.e. not SSDs or 15K
rpm SAS drives), the best real-world performance you will ever get (with
an 8-drive RAID-6 array over 10GbE) is about 160-170 MB/s read (that's
bytes, not bits) and 270 MB/s write. Teaming two 10 GbE connections by
link aggregation does nothing for real-world performance, it's just for
failover redundancy in case one port or one cable dies mid-transfer -
for speed benefits you need to optimize your operations to split up your
packet stream across each link, which is not trivial. These speeds are
way lower than what you get with a benchmarking tool like
CrystalDiskMark - always pays to do real-world work to figure out how
something performs.

This kind of performance means that you can pretty much forget any
multi-user applications with this kind of NAS for large imaging
datasets. You will just not be able to provide access for anything more
than someone copying their data off to their own system to then process
and analyze off a local drive/array, and you should only allow one user
to be logged in at a time or they will all complain that it's taking
ages. Also, if you'll be using your building's built-in Ethernet wiring,
it's unlikely that it will be capable of 10GbE performance, so really
you'll end up asking people to bring their portable drive down to you
and copy the data right off as it'll end up being 2-3x faster (of
course, that also presumes that their portable drive isn't bottlenecking
either). I also have a 6 TB RAID-0 SSD array in my workstation PC, so
whenever I need to process/analyze data, I copy it off the NAS to the
SSD RAID over the 10GbE connection, do all the work on the SSD RAID, and
then copy it back to the NAS for storage.

If you want to actually serve large datasets directly to multiple
simultaneous users, these Synology units will just not cut it. You'll
need an enterprise solution, e.g. something from Dell, HP, or NetApp,
which will cost somewhere in the tens of thousands USD for anything
close to the 60 TB I mentioned. These will run large arrays of either
SSDs (super expensive) or at least fast SAS HDDs (only very expensive),
and will run on a multi-CPU Xeon platform with tons of RAM that is
capable of handling multiple users. The Synology units have really awful
CPUs and very low RAM, which can just barely handle a single user.
You'll also need the necessary infrastructure in your building to allow
remote users to connect at 10GbE speeds (and have them install 10GbE
adapters in their computers), or it will totally defeat the point of
having spent all that money on the enterprise server.

If you're really nuts you can try to build your own and could
conceivably get performance close to what these enterprise units get for
somewhat less money, but you really need to know what you're doing. Plus
you will be entirely responsible for servicing it. You'll need to have a
drive failure plan because server performance will tank for several days
if you have to replace a failed drive and have to rebuild the array -
your users will be very unhappy during the rebuild, and the larger the
array/the slower the drives, the longer the rebuild will take.

Honestly, institutions that host big data labs will sooner or later need
to start taking responsibility for the data if they are going to expect
their labs to continue doing their work and bringing in grant money for
the institution. It makes much more sense for there to be an
institutional "data core facility" to maximize cost efficiency in
purchasing these servers, buildings will need to be rewired with fiber,
and personnel will need to be hired to maintain these servers and
provide user support, all of which costs institutions $$$, so good luck
with that.

No commercial interest for anything I mentioned.

Mel


On 5/8/2018 4:41 PM, Adam Glaser wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hello everyone,
>
> We are currently exploring in lab data storage options for storing the large amounts of data our light-sheet microscopes are producing (while simultaneously providing shared access from multiple PCs for processing and/or visualization).  Given these needs, we have tentatively arrived at purchasing a NAS server.  I was wondering if anyone had suggestions or recommendations on the best companies/options available.  In preliminary searching I have seen FreeNAS pop up several times, but am quite a novice when it comes to data storage - so any input from the listserv would be greatly appreciated!
>
> Thanks!
> Adam
>

--
Menelaos Symeonides
Post-Doctoral Associate, Thali Lab
Department of Microbiology and Molecular Genetics
University of Vermont
318 Stafford Hall
95 Carrigan Dr
Burlington, VT 05405
[hidden email]
Phone: 802-656-1161
Steffen Dietzel Steffen Dietzel
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Adam,

we are using a Synology DS3612xs. 12 bays, each filled with a 4 GByte
disk. That was a huge size when we got it several years back (2011?). It
runs in a RAID 6 configuration which paid of when one of the disks
failed. An annoying beep alerted me to get a new one. It felt rather
comfortable at the time that another one could fail without ruining the
data.

The major advantage of this machine is that it does not need much care.
Once installed it just runs. The web interface is easy to use, the
software updates itself automatically. Adding additional HDs is very
easy (we started out with 5, now 12). I liked it so much that I got
myself a 2-bay Synology at home. Same user interface.

When we got to a new building I added a 10 Gbit glas fiber network card.
While in theory this should give us ~1 Gbyte/s, in reality it achieves
~500 MByte per second. Still not bad compared to the max 120 Mbyte on
the 1Gbit Ethernet.

I have no experience with other NAS systems and (sadly) no financial
interest in Synology.

Steffen


Am 08.05.2018 um 22:41 schrieb Adam Glaser:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Hello everyone,
>
> We are currently exploring in lab data storage options for storing the large amounts of data our light-sheet microscopes are producing (while simultaneously providing shared access from multiple PCs for processing and/or visualization).  Given these needs, we have tentatively arrived at purchasing a NAS server.  I was wondering if anyone had suggestions or recommendations on the best companies/options available.  In preliminary searching I have seen FreeNAS pop up several times, but am quite a novice when it comes to data storage - so any input from the listserv would be greatly appreciated!
>
> Thanks!
> Adam
>
--
------------------------------------------------------------
Steffen Dietzel, PD Dr. rer. nat
Ludwig-Maximilians-Universität München
Biomedical Center (BMC)
Head of the Core Facility Bioimaging

Großhaderner Straße 9
D-82152 Planegg-Martinsried
Germany

http://www.bioimaging.bmc.med.uni-muenchen.de
Kate Luby-Phelps Kate Luby-Phelps
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Other options besides Synology are Buffalo and QNAP. We have some of each. The Buffalo Terastations are particularly reliable. We have had them for more than ten years and disks only recently started failing. You can order Western Digital disks and replace them yourself and the RAID array rebuilds automatically. We don't use our NAS for archiving, just for temporary storage and data serving. You don't have to know Unix to administer either Buffalo or QNAP since they have web GUIs for that. Both Buffalo and QNAP have 10 Gbps NAS. I have been very happy with the telephone support for both Buffalo and QNAP. As has already been mentioned, it is the read/write speed of the disks that sets the upper limit on data transfer rates, but for large files 10 Gbps will be noticeably faster you have it available.
Adam Glaser Adam Glaser
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Thanks everyone for your responses, this has been very helpful.  It sounds like there are many things to consider, especially if we are hoping to interact with the data while it is on the server - rather than just dump the data purely for long term storage.  Two scenarios we are considering are:

1. Record a dataset (0.5 - 1.5 TB in size) on our acquisition workstation (4x2TB RAID0 SSDs, equipped with 10G fiber, 256 GB RAM) -> post-process on this same workstation -> transfer final processed dataset (we use the HDF5-based Imaris format) to NAS server for storage.

- In this scenario, would it be feasible to open, visualize, and interact with this IMS file from 1 or 2 additional workstations via 10Gb with similar specs as the acquisition workstation while it sits on the NAS server?  And in terms of RAM, is the important factor in this scenario how much RAM the workstation has, or the NAS server has?

2. Record a dataset (0.5 - 1.5 TB in size) on our acquisition workstation (4x2TB RAID0 SSDs, equipped with 10G fiber, 256 GB RAM) -> transfer raw data to NAS server -> post-process data on the NAS server using a secondary workstation with similar specs to the acquisition workstation via 10Gb.

- Would this scheme be feasible, and again is the limiting factor for our post-processing the RAM/CPU on the NAS server or the RAM/CPU on the secondary workstation?

Based on everyone's input, we are considering the Synology 16 bay NAS RackStation RS4017xs+.  The specs include the 16 bays, 8 core processor, up to 8 GB (expandable to 64 GB) RAM, two 10 Gb fiber ports, and two additional PCIe 3.0 slots.  However, if we will not be able to interact with the data as planned while it is on the NAS server, we may just use AWS or on-campus options for simply storing and backing up the data.  What initially attracted us to the idea of an in-lab NAS server was the combination of storage + accessibility of our datasets.

Apologies if these are confusing or naive questions!

Thanks,
Adam
Mel Symeonides Mel Symeonides
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hey Adam,

For option 1, if you're looking to do real-time 3D/4D visualization of
the data straight off the NAS, it will be very slow (I bet it's already
plenty slow even off your SSD RAID for very large volumes... imagine
about 10% of that speed off the NAS). The bottleneck here will be the
speed of the HDD array on the NAS, which is in turn related to the drive
interface (in the case of that Synology NAS, it's SATA, which is the
slowest) as well as the CPU and RAM of the NAS.

Your workstation is plenty powerful for any NAS and will never be the
bottleneck, and the 10GbE interface will not be maxed out unless your
NAS is loaded with SSDs. You will see better performance if the drives
on the NAS are connected via SAS (which the NAS would specifically have
to support), which run full-duplex (simultaneous read-write streams) and
spin at much higher speeds (10K or 15K RPM vs. 7.2K RPM for most SATA
drives), but honestly at that point I would maybe just go with SSDs
(we're talking huge cost increases at this point either way). You will
most likely see some performance gains in real-world operations with a
cheap/slow SATA HDD array by increasing the RAM on the NAS (the more the
better), and using those PCIe slots for NVME SSD caching.

Option 2 might work, depending on how many datasets of that size you
think you'll be getting every day/week and what processing is required
(I'd guess deconvolution, deskewing, masking/thresholding?). You should
expect that it'll take a day or more to process a dataset that lives on
the NAS, whereas it'll take a few hours if it's on a local SSD RAID.
However, that sounds like quite a bit of a higher cost to set up a
separate analysis workstation just to cope with the slowness of the NAS.
Would it just make more sense to have a combined acquisition/analysis
workstation and have scripted analysis run when you're not acquiring new
data, e.g. overnight?

Mel


On 5/9/2018 11:25 AM, Adam Glaser wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Thanks everyone for your responses, this has been very helpful.  It sounds like there are many things to consider, especially if we are hoping to interact with the data while it is on the server - rather than just dump the data purely for long term storage.  Two scenarios we are considering are:
>
> 1. Record a dataset (0.5 - 1.5 TB in size) on our acquisition workstation (4x2TB RAID0 SSDs, equipped with 10G fiber, 256 GB RAM) -> post-process on this same workstation -> transfer final processed dataset (we use the HDF5-based Imaris format) to NAS server for storage.
>
> - In this scenario, would it be feasible to open, visualize, and interact with this IMS file from 1 or 2 additional workstations via 10Gb with similar specs as the acquisition workstation while it sits on the NAS server?  And in terms of RAM, is the important factor in this scenario how much RAM the workstation has, or the NAS server has?
>
> 2. Record a dataset (0.5 - 1.5 TB in size) on our acquisition workstation (4x2TB RAID0 SSDs, equipped with 10G fiber, 256 GB RAM) -> transfer raw data to NAS server -> post-process data on the NAS server using a secondary workstation with similar specs to the acquisition workstation via 10Gb.
>
> - Would this scheme be feasible, and again is the limiting factor for our post-processing the RAM/CPU on the NAS server or the RAM/CPU on the secondary workstation?
>
> Based on everyone's input, we are considering the Synology 16 bay NAS RackStation RS4017xs+.  The specs include the 16 bays, 8 core processor, up to 8 GB (expandable to 64 GB) RAM, two 10 Gb fiber ports, and two additional PCIe 3.0 slots.  However, if we will not be able to interact with the data as planned while it is on the NAS server, we may just use AWS or on-campus options for simply storing and backing up the data.  What initially attracted us to the idea of an in-lab NAS server was the combination of storage + accessibility of our datasets.
>
> Apologies if these are confusing or naive questions!
>
> Thanks,
> Adam
Michael Giacomelli Michael Giacomelli
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

In my experience, performance on the Synology machines scales with the
number file accesses per second quite linearly, so if you do a lot of
large files, you will easily bottleneck on 1 gbit ethernet while the
load on the server will be small even with the cheaper models that use
slow Intel Atom processors.  If you try to do small files (e.g. a lot
of 50 KB JPEG like when making Deepzoom images), you end up completely
bottlenecking the Synology.  With a higher end Synology like you're
looking at (probably 2 or 3x the CPU power), more disks, and
relatively large image files, extrapolating, think you'll have at
least a few hundred MB/s even with multiple concurrent users.

If you think you may need faster than that, it might be worth looking
up some benchmarks.  High performance NAS is a big market, so I'm sure
there is information out there.

Mike

On Wed, May 9, 2018 at 11:25 AM, Adam Glaser <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Thanks everyone for your responses, this has been very helpful.  It sounds like there are many things to consider, especially if we are hoping to interact with the data while it is on the server - rather than just dump the data purely for long term storage.  Two scenarios we are considering are:
>
> 1. Record a dataset (0.5 - 1.5 TB in size) on our acquisition workstation (4x2TB RAID0 SSDs, equipped with 10G fiber, 256 GB RAM) -> post-process on this same workstation -> transfer final processed dataset (we use the HDF5-based Imaris format) to NAS server for storage.
>
> - In this scenario, would it be feasible to open, visualize, and interact with this IMS file from 1 or 2 additional workstations via 10Gb with similar specs as the acquisition workstation while it sits on the NAS server?  And in terms of RAM, is the important factor in this scenario how much RAM the workstation has, or the NAS server has?
>
> 2. Record a dataset (0.5 - 1.5 TB in size) on our acquisition workstation (4x2TB RAID0 SSDs, equipped with 10G fiber, 256 GB RAM) -> transfer raw data to NAS server -> post-process data on the NAS server using a secondary workstation with similar specs to the acquisition workstation via 10Gb.
>
> - Would this scheme be feasible, and again is the limiting factor for our post-processing the RAM/CPU on the NAS server or the RAM/CPU on the secondary workstation?
>
> Based on everyone's input, we are considering the Synology 16 bay NAS RackStation RS4017xs+.  The specs include the 16 bays, 8 core processor, up to 8 GB (expandable to 64 GB) RAM, two 10 Gb fiber ports, and two additional PCIe 3.0 slots.  However, if we will not be able to interact with the data as planned while it is on the NAS server, we may just use AWS or on-campus options for simply storing and backing up the data.  What initially attracted us to the idea of an in-lab NAS server was the combination of storage + accessibility of our datasets.
>
> Apologies if these are confusing or naive questions!
>
> Thanks,
> Adam
Adam Glaser Adam Glaser
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Thanks Mike and everyone else for the continued advice.  It sounds like although we would not get maximum theoretical transfer speeds to a NAS server, with a larger drive, more server RAM and CPU, we should be able to get relatively decent speeds as well as many TB of storage.

I have one last question for everyone.  Has anyone tested streaming data from sCMOS cameras to a PCIe SSD?  

I understand that the norm is streaming to a RAID0 SATA SSD array, like we have now in our workstation.  The advantage with the SATA SSD being that you get a proportionally larger drive size and speed, e.g. 8 TB and >1 GB/sec speed for our setup with 4x2 TB SSDs.

The latest PCIe SSD (like the Samsung 960 Evo/Pro) get equivalent speeds but without the ability to proportionally increase the drive size.  But - since our raw data set sizes are generally around 1 TB, a single 2 TB PCIe SSD would be able to capture an entire dataset for us.  Then, after acquisition, the raw data could be dumped from the PCIe SSD to the HDD RAID array in the SATA slots.  This would provide many TB of storage, albeit at reduced speeds.  It would allow a single workstation to acquire data and also have increased storage (although not as large as a NAS server with 12 or 16 bays).

Thanks!
Adam
Michael Giacomelli Michael Giacomelli
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

Hi Adam,

I haven't done sCMOS, but we do stream from Alazar PCIe A/D boards
(2-3 GB/s) when doing imaging. Getting close to the benchmark rates is
challenging with NVME.  You need large transfer blocks, and ideally
several running in parallel. The included Alazar software can easily
saturate a 960 Evo (it'll do > 6 GB/s to RAM), or a RAID0 of 4x840
Pros, but I never figured out what they were doing to make it work so
well.  I tested a few different Win32 APIs, but ended up just doing
fopen/fwrite and then spinning off a lot of parallel threads for
writing.  That more or less saturated a single 960 Evo's sustained
performance (see below), but I'm not sure how well it'd scale to
faster disks.

By the way, I'd recommend against getting a 960 Evo.  They have very
fast burst writes to the device's SLC cache, but sustained performance
once the SLC is exhausted and you fall back to TLC memory is worse.
The 960 Pro is MLC, will sustain higher data rates if you want to
completely fill the disk in one go.

You could also get several cheaper NVMe devices (EVOs or other TLC)
and software RAID them (which works surprisingly well) if you don't
want to pay for the Pro or need more sustained write than a single
device can handle.  Some newer boards support splitting the PCIe x16
slot into 4xM.2, in which case you can fit a lot of NVME devices in
one system.

Mike


On Thu, May 10, 2018 at 2:01 PM, Adam Glaser <[hidden email]> wrote:

> *****
> To join, leave or search the confocal microscopy listserv, go to:
> http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
> Post images on http://www.imgur.com and include the link in your posting.
> *****
>
> Thanks Mike and everyone else for the continued advice.  It sounds like although we would not get maximum theoretical transfer speeds to a NAS server, with a larger drive, more server RAM and CPU, we should be able to get relatively decent speeds as well as many TB of storage.
>
> I have one last question for everyone.  Has anyone tested streaming data from sCMOS cameras to a PCIe SSD?
>
> I understand that the norm is streaming to a RAID0 SATA SSD array, like we have now in our workstation.  The advantage with the SATA SSD being that you get a proportionally larger drive size and speed, e.g. 8 TB and >1 GB/sec speed for our setup with 4x2 TB SSDs.
>
> The latest PCIe SSD (like the Samsung 960 Evo/Pro) get equivalent speeds but without the ability to proportionally increase the drive size.  But - since our raw data set sizes are generally around 1 TB, a single 2 TB PCIe SSD would be able to capture an entire dataset for us.  Then, after acquisition, the raw data could be dumped from the PCIe SSD to the HDD RAID array in the SATA slots.  This would provide many TB of storage, albeit at reduced speeds.  It would allow a single workstation to acquire data and also have increased storage (although not as large as a NAS server with 12 or 16 bays).
>
> Thanks!
> Adam
Craig Brideau Craig Brideau
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

On Thu, May 10, 2018 at 12:01 PM Adam Glaser <[hidden email]> wrote:

> I have one last question for everyone.  Has anyone tested streaming data
> from sCMOS cameras to a PCIe SSD?
>
> I understand that the norm is streaming to a RAID0 SATA SSD array, like we
> have now in our workstation.  The advantage with the SATA SSD being that
> you get a proportionally larger drive size and speed, e.g. 8 TB and >1
> GB/sec speed for our setup with 4x2 TB SSDs.
>
> The latest PCIe SSD (like the Samsung 960 Evo/Pro) get equivalent speeds
> but without the ability to proportionally increase the drive size.  But -
> since our raw data set sizes are generally around 1 TB, a single 2 TB PCIe
> SSD would be able to capture an entire dataset for us.  Then, after
> acquisition, the raw data could be dumped from the PCIe SSD to the HDD RAID
> array in the SATA slots.  This would provide many TB of storage, albeit at
> reduced speeds.  It would allow a single workstation to acquire data and
> also have increased storage (although not as large as a NAS server with 12
> or 16 bays).
>

The bleeding edge right now are solid-state M.2 and U.2 drives that use the
PCIe bus for communication with the mainboard. You can buy PCIe cards that
host these drives and their access speed is greater than any of the
available SATA formats. Each drive takes 4x PCIe lanes though so you need
to make sure you have the available lanes on your motherboard. The newer
Ryzen chips tend to support more PCIe lanes on the motherboard.

Craig
Olaf Selchow Olaf Selchow
Reply | Threaded
Open this post in threaded view
|

Re: Data storage - NAS servers

In reply to this post by Adam Glaser
*****
To join, leave or search the confocal microscopy listserv, go to:
http://lists.umn.edu/cgi-bin/wa?A0=confocalmicroscopy
Post images on http://www.imgur.com and include the link in your posting.
*****

*** please note that I have commercial interests in what I write here in that I am working with the mentioned solution provider in a number of projects ***


Hello Adam, Hello List,

I hope I can add a bit to the earlier posts, since all of them raised already the relevant topics.

I look at this topic with the following background: I have been working on a number of light sheet projects and an appropriate data solution has always been a key part for the successful use of the setups. My experiences include commercial systems (e.g., Zeiss, Luxendo, ASI/3i) and home build light sheet systems. As a reference for data rates, I always keep in mind that 1 sCMOS camera, 4 MPixel, 16 bit image acquisition at full speed (100 fps) delivers approximately 800 MByte/sec. Hence, most light sheet systems easily produce 800 Mbyte/ sec or even 1.6 GByte/sec over extended periods of time.

My major learnings, also with respect to your 2 scenarios, were

- Copying the data from a primary acquisition volume to a second volume for processing or safer/longer storage has always become a bottleneck very soon. I'd recommend to go for a solution that allows delivery of the data from the microscope straight to the storage where it is supposed to stay and where it is safe and at the same time accessible for processing. This involves fast and safe volumes and fast networks. Typically larger RAID systems of SAS HDDs (e.g., 15+ SAS enterprise HDDs will build the basis, suitable RAID controllers and multiple dedicated 10 Gbit network connections are needed. If well configured, you can get > 1 Gbyte/ sec saved and at the same time keep capacities for simultaneous processing i/o. The RAID controller is a key component for fast i/o! Many of them fail to support the data rates that the disc arrays per se would allow.

- Pulling data over network into a processing unit is slow, especially when multiple users are supposed to work simultaneously. A 10 Gbit label on the cable/adapter will not necessarily save you. Maybe a direct 10 Gbit connection is ok, but the built-in network of a research institute will typically not get you the performance you need. This is even more important when working with I/O intensive processing like 3D visualization and analysis or Deconvolution. Your processing unit with CPU/RAM/GPU capacities should be "close" to the data. Directly connected over a bus system like SAS. 10 Gbit Ethernet is only 2nd choice here.

- Together with the processing resources (RAM, CPU or GPU), the limiting factors are usually network bandwidth and i/o to/from the data volume.

- Scalability is essential. If you start with 30 Terabyte today, you should already have a plan from the beginning about how to expand these capacities without re-formatting huge volumes. Also, the option of expanding RAM and number of GPUs is worth considering.

Therefore, in my opinion, NAS is often too slow if more than just storage is required. It has been mentioned by the earlier posts that good commercial ("enterprise") solutions are available. Of course, you can build performing solutions yourself, if you know what you are doing and if you have the people and time to do it. If you buy, make sure the provider knows your applications. If they usually supply big data servers for relational databases but don't know what a microscope is, you can just as well configure the system yourself. I have gone through this the hard way.

The best solution for image data that I am aware of has been developed and is sold by a joint team of microscopy experts, IT hardware experts and network specialists: The HIVE platform by the company ACQUIFER (https://www.acquifer.de/data-solutions/, [hidden email]).

ACQUIFER is a solution provider and they go with you through all steps of assessing your needs, talking to you local IT people, shipping, installing, supporting your applications and servicing the platform. Importantly: they work together with microscope and camera manufacturers, run tests, are keen to support your applications.

Therefore, the HIVE is more cost-intensive than a simple NAS or a basic enterprise solution. But you get a most competent partner who also stays with you after the installation is done.

The ACQUIFER HIVE is a professional solution, combining fast and safe RAID storage from 50 TB to Petabytes with high end processing units, including solutions for GPU-intensive processing.

It is modular, allowing to be configured to your needs (CPU/RAM/GPU - and storage volume), whether it is dedicated to a single high-speed microscope or set up as a central data solution for core facilities. And it can easily grow later. It also comes with a dedicated network module including router, firewall, uninterruptable power supply, multi-10 Gbit network switches for dedicated and direct connection from microscope to storage.

It runs Windows Server (multi-user access and easy operation from remote!) - and new releases of common software packages for image processing are regularly tested to ensure they work fine.

High data rates and large data sets from all modern microscopy modalities, such as Light-sheet, fast Confocal and Super Resolution Microscopes and also Cryo-Electron Microscopes can be acquired and processed in one multi-user platform.

In summary, it is a small computing center that is easy to manage and can go under the desk of the microscope, right where it is needed.

If you want to discuss with me offline or if you want me to forward you to the ACQUIFER team, feel free to email me.

Best regards,
Olaf

____________________________
Dr. Olaf Selchow
Microscopy & BioImaging Consulting
[hidden email]
+49 172 3286313
skype: olaf.selchow