Setup FSx Lustre PERSISTENT_2 with AWS ParallelCluster 🗂️

Posted on Feb 19, 2022
tl;dr: fast filesystem for hpc clusters

FSx Lustre + ParallelCluster Architecture

Overview

AWS ParallelCluster only supports PERSISTENT_1, SCRATCH_1 and SCRATCH_2 as filesystems created by the cluster, however to launch filesystems with PERSISTENT_2 (announced at re:Invent 2021), you can create the filesystem outside of pcluster and then mount in the config.

Why use PERSISTENT_2?

Setup

From the AWS ParallelCluster docs we learn:

If using an existing file system, it must be associated to a security group that allows inbound TCP traffic to port 988.

So we’ll need to:

  1. Create the Security Group
  2. Create the filesystem & associate the security group
  3. Create a cluster that mounts the filesystem

1. Create Security Group

  1. Create a new Security Group by going to Security Groups > Create Security Group:
  • Name FSx Lustre
  • Description Allow FSx Lustre to mount to ParallelCluster
  • VPC Same as pcluster vpc

image

  1. Create a new Inbound Rule
  • Custom TCP
  • Port 988
  • Same CIDR as the VPC 172.31.0.0/16

Inbound Rule

  1. Leave Outbound Rules as the default:

Outbound Rule

2. Create FSx Filesystem

  1. Go to the FSx Lustre Console and click Create Filesystem.
  2. On the next screen, select FSx Lustre:

Select FSx Lustre

  1. On the next page, you’ll see an option for Persistent. This the new PERSISTENT_2 type, it’s simply called Persistent on the AWS console, PERSISTENT_2 in the API to maintain backwards compatibility.

Persistent_2

  1. Make sure to enabled LZ4 Compression, this both decreases filesystem size and improves performance.

Data Compression

  1. Make sure to check the box under Data Repository Import/Export, this enables future linking to S3.

DRA Import

  1. Create the filesystem in the same subnet as AWS ParallelCluster.

Subnet/SG Setup

3. Attach Filesystem to AWS ParallelCluster

  1. After the filesystem has finished creating, grab the filesystem ID from the FSx console:

image

  1. Update the config file to include that filesystem id:
SharedStorage:
  - Name: FsxLustre
    StorageType: FsxLustre
    MountDir: /shared
    FsxLustreSettings:
      FileSystemId: fs-12345678910 # <- fs id from the fsx console
  1. If you’re using pcluster-manager, simply check the box next to Use Existing Filesystem and select the filesystem you just created:

ParallelCluster UI

Once the filesystem has been created, you can now link it to an S3 Bucket. This allows you to sync data back and forth between the filesystem and S3. It also allows you to delete the filesystem and preserve it’s content on S3.

  1. Navigate to the FSx Console > Filesystem > Data repositories > Click Create data repository association.

Create DRA

  1. Link to an S3 bucket in the same region:
Field Description
Filesystem Path Path of the FSx Filesystem to sync back to S3 after the mountpoint. Make this / to sync the entire thing.
Data Repository Path Path on S3 to store synced content i.e. s3://bucket/ will replicate to the root of the bucket.

Link S3 Bucket

  1. Now you can select your import & export settings:

Import and Export

comments powered by Disqus