File Archive Segment Size

I usually go with no limiting segmentation file,

but sometimes I recive a heads up from IT because a large size file is unconfortable to transfer… so I understood the size file/ numerber of segments matters just for transfering, uploading, etc…

Good point. I think Dropbox tops out at 100GB/file on most accounts. I’m not sure on G-Drive, though G-Drive has a 750GB/day/account limit even for Enterprise accounts, which is insanity in it’s own right.

Anything above 100GB per single file, you definitely need to test what’s feasible. AWS, Lucid, MASV handle those w/o issue.

2 Likes

The optimal segment size can really depend on the storage environment, and the wrong choice by someone who hasn’t been told what works best in your environment can cause a lot of grief.

So although you can do this using the command line archiving tool, does anyone know of a way to force the default archive segment size in the Flame UI to avoid having to rely on the user to always remember to set it correctly? Any magic env var?

It seems to default to the last setting used.
What are some scenerios that can cause grief?

I had a weird one where I archived to 100gb segments, and shared via dropbox, and the other party downloaded it, but the segments were coming in at just 20gb or so. Couldn’t figure out what went wrong. Never had a problem like that before on dropbox.

“optimal segment size” is very much a Goldilocks situation. Too small and you end up with thousands of files in the same directory, which is never good. Too large and you end up with enormous files you can’t do anything with.

Also in a storage environment where you’ve got some kind of intelligent tiering going on, you want to be careful to pick a segment size which works well with your tiering solution.

But thanks for the hint about the segment size being persistent, presumably it gets saved somewhere in a user setting, so might be possible to pre-populate that.

I use the 4.7G size because it’s pretty easy to move around.

A few years ago when using Flame on AWS, I learned that some S3 bucket transfer apps can’t handle anything above 5GB unless they’re multi-threaded. I’m not sure the proportion of single threaded apps there are out there, but at the time, most of them had trouble with large files.

1 Like

That is actually true of most cloud storage solutions, not specific to AWS. The problem is latency in the network connection. If you transfer large files to the cloud via single stream the transfers are very slow because the latency between packages slows things down. Most apps (CyberDuck, Lucid, etc.) actually open 64, 128, or even more concurrent streams, each transferring just a chunk of the file (usually 64MB or something like that). Because these streams run concurrently they can then saturate your network connection despite the latency. However, once all these chunks are uploaded or downloaded, they have to be re-assembled into the large file. Which is why you see some cloud transfers get to 99% and then suddenly pause for what seems like a long time. That when the data is all there, but it’s piecing it together. If you download with Cyberduck you can see all these separate segments in a folder and watch it re-assemble them. In the S3 API this is CreateMultipartUpload, UploadPart, and CompleteMultipartUpload.

There are a number of variables at play that you can optimize. The latency will be different if your transfer crosses the continent or even overseas, vs. it’s within the local cloud region. So you may need more or fewer concurrent connections (which are configurable in most apps) depending on that. There’s also a sweet spot based on how many files the app is willing to transfer in parallel (GoodSync for example defaults to 3) vs. how many concurrent connections may be open for all the transfers in action. Sometimes small file transfers create more overhead and don’t utilize all the concurrent connections.

No formula fits all cases. But if you do a lot of cloud transfers for your archive, it’s worth doing some testing to find the sweet spot.

Also cloud storage doesn’t have the same limitations as local file systems when it comes to things like number of files in a folder. Cloud storage is always object based, totally different than disk filesystem.

1 Like

256MB-512MB is better if you often need to restore individual files
1GB+ can improve throughput for pure archival storage where you rarely need to restore
(LTO-8 and newer), a segment size between 256MB to 1GB is typically recommended. This range balances:
Modern tape drive buffers (usually 1GB+)

we default to 75gb @johnt

1 Like

The Development Team have accepted this into their database as a valid Defect, here is the official id for your records:

FLME-69209 - Very long time to close archives with lots of appends

2 Likes

This slowdown occurs because:

  • The archive header is read by flame.
  • The new data is appended.
  • A new header is compiled along with the associated metadata/thumbnails/webpages and written to disk.
  • The new header gets written back to the archive.

Depending on your configuration this is most likely all written to the system disk, where it is in contention and queued.

You can speed all of this up by:

  • Using faster processors.
  • Using faster system disk(s).
  • Using a different location for the thumbnails/html/xml/atoc/otoc.
  • Using a faster archiving system. (NAS, SAN, whatever)
  • Using backburner to do these tasks in the background.
  • Assigning all archiving tasks to a robot that works while you sleep. (cron)

Not sure how much of an impact these have. We used to do this at the mill but still got hideously slow archives.

I’m with you on automated background archiving. However I’m still keen to speed up the archiving because the worst thing in the world would be to come back in the morning to a flame still trying to archive. Which I have experienced.

also, if you choose to include Project Setups, they do the most in-effecient way, I believe it is basically a tar with no compression, so you end up embedding a giant TAR file inside the Flame File Archive. And it can’t split that TAR file, so your segment size needs to be bigger than that TAR, and you have no idea how big the TAR will end up being, so you basically should just make your segment size 25gigs minimum. And then if you goto update that Archive it needs to embed a whole new setups TAR, so the segment size needs to accomodate that too. Its all a shit archiac legacy system.

I hear you and agree @ALan

@johnt - writing tens of thousands of files to your system disk absolutely has an impact.

untar-ing and tar-ing tens of thousands of files absolutely has an impact.

processing unique IDs for files and the associated markup language for tens of thousands of files absolutely has an impact.

many of these files are negligibly small so reading and writing them is demonstrably impeded or accelerated depending on the block size of your storage.

writing tens of thousands of files over a network to a non-optimal file store can be catastrophically slow.

which brings us all back round to tar-ing the files into one monolithic file, thereby only taking out one workstation, then moving that massive chunk to the file-store over the network.

Yes. I think I understand. So we need a different solution and I feel that might be coming based on aspects of the 2026 release.

@johnt - you may be right - postgres may go some way to climbing a different mountain.

Aren’t project setups just ascii files? Granted my projects are small, but my setups rarely top 300 mb on a 5tb archive. The only time I’ve seen project setups go through the roof was when people would save media to the images folder.