User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » Reducing IO bottleneck Page [1]  
darkone
(\/) (;,,,;) (\/)
11609 Posts
user info
edit post

I thought I would get the opinions of some of the folks around here on a problem I've been debating with myself on how to manage.

I work with satellite data. The data sets I use are multi-terabyte data sets are can't be stored on your typical desktop workstation. The data analysis I usually perform involves aggregating data from all the files in the data set. In my lab, we store the files of a several servers accessed across the LAN with 100 mbps interlinks. For my work, the bottleneck on how long the processing takes is always IO limited.

What suggestions would you have on how to best remove the IO bottleneck? Should I be considering something like fiber interlinks? Expensive networking hardware? How would you do this with a really small budget? I'm looking at situations where reducing the IO bottleneck by 20% would save days or even weeks of processing time per project.

3/9/2010 1:16:33 PM

qntmfred
retired
40551 Posts
user info
edit post

are you saying the IO bottleneck is network IO or disk IO

as for network, it should be very easy to upgrade to gigabit. but even then, your disk IO is most likely going to be the limiting factor anyways. you probably aren't even maxing out the 100 mb/s. if you're able to upgrade your hard drives, look for drives with high IOPS metrics. there's not typically a lot of room to upgrade unless you go to SSD though, which is $texas


[Edited on March 9, 2010 at 1:58 PM. Reason : pick up a few of these http://www.ramsan.com/products/ramsan-20.htm]

3/9/2010 1:26:09 PM

FroshKiller
All American
51908 Posts
user info
edit post

RAMDISK

3/9/2010 1:31:17 PM

darkone
(\/) (;,,,;) (\/)
11609 Posts
user info
edit post

^^ The data on the servers is on 12 disk RAID 5 volumes (7200RPM SATA drives). In theory, the data never sees the HDD on the local workstation as it stays in memory the whole time it's being used.

Sadly, my lab is at the mercy of NCSU and departmental IT for the network hardware between the workstations and the servers. I'd love to install gigabit switches, but it will be a cold day in hell before the put up the money for new network hardware.

I wish we could afford SSDs. I shutter to think about what 100 Tb of SSDs costs.

3/9/2010 3:22:27 PM

greeches
Symbolic Grunge
2604 Posts
user info
edit post

I would say upgrade to atleast Gigabit or add more disks. You are either network or disk bound, throw more spindles into the mix. RAID-5 isn't a very fast array either, may wanna try some sort of RAID-0 config for performance, if you are indeed disk bound.

Have you figured out where your bottleneck is? (network/storage)

3/9/2010 4:12:34 PM

darkone
(\/) (;,,,;) (\/)
11609 Posts
user info
edit post

^ I don't know if I'm Network or Disk limited. That's a test I'll have to run.

3/9/2010 5:45:08 PM

Shaggy
All American
17820 Posts
user info
edit post

Raid 5 offers good read performance, and poor write performance. Given your limited budget its probably the best choice. If you can get more disks, maybe do raid 10 (if it is disk limited).

dont ever use raid 0.

[Edited on March 9, 2010 at 6:08 PM. Reason : aaa]

3/9/2010 6:07:49 PM

Shadowrunner
All American
18332 Posts
user info
edit post

This is a shot in the dark since you haven't described your work in any detail, but "satellite" + "multi-terabyte" suggests to me that you're working with sparse, uncompressed data. You might be able to dramatically reduce the amount of data you need to touch if you rework some of your analysis to take advantage of compressive sensing algorithms.

This has been my "I realize this thread is about reducing your IO bottleneck... which my solution is not" suggestion.

3/9/2010 6:36:12 PM

smoothcrim
Universal Magnetic!
18954 Posts
user info
edit post

what are the chances that this data is deduped? if you could get a decent filer that would dedupe the data and cache what's left, you could increase your IO significantly since many blocks wouldn't have to be read from disk. I would setup opensolaris and a zfs pool. you can get a lot of this functionality with essentially a jbod and a cheap box to run it. I'd also run gigabit links since any modern sata disk can fully saturate a 100mbps connection

[Edited on March 9, 2010 at 6:51 PM. Reason : and if you cant run 1gbe, then run 802.11n MIMO. even that is more than 100mbps]

3/9/2010 6:50:38 PM

Opstand
All American
9256 Posts
user info
edit post

Or buy a networked storage system from me

3/9/2010 6:53:21 PM

smoothcrim
Universal Magnetic!
18954 Posts
user info
edit post

I got fucking cold called by netapp yesterday while their stuff was shitting the bed. the conversation that ensued was epic

3/9/2010 7:16:29 PM

gs7
All American
2354 Posts
user info
edit post

^^I came here to suggest that.

^You can't say that without giving details



[Edited on March 9, 2010 at 7:23 PM. Reason : .]

3/9/2010 7:22:20 PM

smoothcrim
Universal Magnetic!
18954 Posts
user info
edit post

basically there's a virtually undocumented 16TB limit on deduped data within a volume. when you hit that limit (whether it's actually 16TB or 16 thin provisioned TB - I guess ontap is protecting itself since the volume could balloon to that size) instead of instant or dedupe clones, it starts making deep copies. in the middle of the shitstorm created by this, that I'm trying to clean up, some guy from netapp calls me and asks if im interested in netapp and if i plan on buying any in the future. hilarity followed.

3/10/2010 10:03:06 AM

darkone
(\/) (;,,,;) (\/)
11609 Posts
user info
edit post

Quote :
""satellite" + "multi-terabyte" suggests to me that you're working with sparse, uncompressed data"


The data are neither sparse or uncompressed.

The data in question are satellite HDF files. We keep them gzipped since they compress by about 40-60% from their binary form.
http://en.wikipedia.org/wiki/Hierarchical_Data_Format

3/10/2010 7:24:23 PM

Shadowrunner
All American
18332 Posts
user info
edit post

Carry on, then. Like I said, that was a shot in the dark on my part.

3/10/2010 9:40:39 PM

Shaggy
All American
17820 Posts
user info
edit post

Just to make sure, is the data decompressed client side (after the network transfer) or serverside (prior to network transfer)?

3/10/2010 9:43:25 PM

darkone
(\/) (;,,,;) (\/)
11609 Posts
user info
edit post

^ Client side. The files traverse the network in their compressed state.

3/10/2010 9:55:01 PM

Shaggy
All American
17820 Posts
user info
edit post

well then! Sounds like the next step is to actually figure out if its disk or network.

If it is network, I wonder if you could buy some (relatively) cheap sata disks and create your own local replica of the data. How often is it updated?

3/10/2010 9:59:42 PM

darkone
(\/) (;,,,;) (\/)
11609 Posts
user info
edit post

^ The data sets are fairly static. I tend to update them monthly in one-month chunks.

It's hard to justify the expense in purchasing hardware for storing the data locally on the user(s) workstation considering that we already spend a lot maintaining and expanding our primary network storage machines and the fact that it's very difficult to get funding agencies to authorize spending on computer hardware. It's one of those things they expect departments to provide out of grant overhead. Of course, departments that will spend money on that sort of thing are very rare. In our department we get grief about printer paper. I'm sure you could image how a request for workstations with 10 TB local storage would go.

Given an unlimited budget, we would have moved to fiber and SSDs and I could chew my way through a 10 Tb dataset in less than half a day.

[Edited on March 11, 2010 at 5:25 PM. Reason : typing FTL]

3/11/2010 5:24:28 PM

 Message Boards » Tech Talk » Reducing IO bottleneck Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.38 - our disclaimer.