Skip to content

Commit

Permalink
Merge pull request #59 from gkaf89/refactor/data-transfer
Browse files Browse the repository at this point in the history
Refactor/data transfer
  • Loading branch information
gkaf89 authored Jun 7, 2024
2 parents 137c614 + 99c8625 commit 6ac1a6c
Show file tree
Hide file tree
Showing 2 changed files with 74 additions and 18 deletions.
83 changes: 72 additions & 11 deletions docs/data/transfer.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,10 @@ $> rsync -avzu aion-cluster:experiments/parallel_run /path/to/local/directory

As always, see the [man page](https://linux.die.net/man/1/rsync) or `man rsync` for more details.

??? info "Windows Subsystem for Linux (WSL)"
In WSL, the home directory in Linux virtual machines is not your home directory in Windows. If you want to access the files that you downloaded with `rsync` inside a Linux virtual machine, please consult the [WSL documentation](https://learn.microsoft.com/en-us/windows/wsl/) and the [file system](https://learn.microsoft.com/en-us/windows/wsl/filesystems) section in particular.


### Data Transfer within Project directories

The ULHPC facility features a [Global Project directory `$PROJECTHOME`](../filesystems/gpfs.md#global-project-directory-projecthomeworkprojects) hosted within the [GPFS/SpecrumScale](../filesystems/gpfs.md) file-system.
Expand All @@ -166,6 +170,8 @@ You have to pay a particular attention when using `rsync` to transfer data withi
end="<!--end-warning-clusterusers-->"
%}

??? info "Debugging quota issues"
Sometimes when working with filer from projects that have run out of quota you may encounter errors due to insufficient space. Note that even if a single directory is copied from a project without changing its group, all future files created in the copied directory will count towards the group quota (unless if specified otherwise explicitly). In such cases just set the correct ownership using `chown -R <username>:<groupname> <directory>`.


## Using MobaXterm (Windows)
Expand Down Expand Up @@ -267,7 +273,7 @@ diskutil umount ~/ulhpc

## Transfers between long term storage and the HPC facilities

The university provides central data storage services for all employees and students. The data are stored securely on the university campus and are **managed by the IT department**. The storage servers most commonly used at the university are
The university provides **central data storage** services for all employees and students. The data are stored securely on the university campus and are **managed by the IT department**. The storage servers most commonly used at the university are

- Atlas (atlas.uni.lux) for staff members, and
- Poseidon (poseidon.uni.lux) for students.
Expand All @@ -277,18 +283,73 @@ For more details on the university central storage, you can have a look at
- [Usage of Atlas and Poseidon](https://hpc.uni.lu/accessing_central_university_storage), and
- [Backup of your files on Atlas](https://hpc.uni.lu/moving_files_to_the_central_university_storage).

!!! info "Connecting to data storage services from a personal machine"
The examples presented here are targeted to the university HPC machines. To connect to a university central data storage with a (Linux) personal machine from outside of the university network, you need to start first a VPN connection.
!!! info "Connecting to central data storage services from a personal machine"
The examples presented here are targeted to the university HPC machines. To connect to the university central data storage with a (Linux) personal machine from outside of the university network, you need to start first a VPN connection.

The SMB shares exported for directories in the central data storage are meant to be accesses interactively. Unlike mounting with `sshfs`, you will always need to enter your password to access a directory from the central data storage, so you cannot use SMB share in job scripts at login nodes. Transfer your data manually after your job has finished. You can mount directories from the central data storage in the login nodes, and access the central data storage through the interface of `smbclient` from both the login nodes and the compute nodes in interactive jobs.

The following commands are for Atlas, but commands for Poseidon are similar.

### Mounting an SMB share to a login node

The UL HPC team provides the `smb-storage` script to mount SMB shares in login nodes.

- To mount your default user directory from the default `users` share (only for staff members) call in an shell session
```bash
smb-storage mount name.surname
```
and your directory will be mounted to the default mount location:
```
~/atlas.uni.lux-users-name.surname
```
- To mount a project share `project_name` call in a shell session
```bash
smb-storage mount name.surname --project project_name
```
and the share will be mounted in the default mount location:
```
~/atlas.uni.lux-project_name
```
- To unmount any share, simply call the `unmount` subcommand with the mount point path, for instance
```bash
smb-storage unmount ~/atlas.uni.lux-users-name.surname
```
or:
```bash
smb-storage unmount ~/atlas.uni.lux-project_name
```

The `smb-storage` script provides a optional flags to modify the default options:

- `--help` or `-h` prints information about the usage and options of he script,
- `--server <server url>` or `-s <server url>` specifies the server from which the SMB share is mounted (use `--server poseidon.uni.lux` to mount a share from Poseidon),
- `--project <project name>` or `-p <project name>` mounts the share `<project name>` (the default project `users` is mounted),
- `--mountpoint <path>` or `-m <path>` selects the path where the share will be mounted (the default location is `~/<server url>-<project name>-<linked directory>`),
- `--debug` of `-d` print details of the operations performed by the mount script.

!!! info "Best practices"

Mounted SMB shares will be available in the login node, the mount point will appear as a dead symbolic link in compute nodes. This is be design, you can only mount SMB shares in login nodes because SMB shares are meant to be used in interactive sections.

Mounted shares will remain available as long as the login session where the share was mounted remains active. You can mount shares in a `tmux` session in a login node, and access the share from any other session in the login node.

??? info "Details of the mounting process"
There exists a default SMB share `users` where all staff member have a directory named after their user name (`name.surname`). If no share is specified with the `--project` flag, the default share `users` is mounted in a specially named directory in `/run/user/${UID}/gvfs`, and a symbolic link to the user folder is created in the mount location by the `smb-storage` script.

All projects have a share named after the project name. If a project is specified with the `--project` flag, the project share is mounted in a specially named directory in `/run/user/${UID}/gvfs`, and a symbolic link to the whole project directory is created in the mount location by the `smb-storage` script.

During unmounting, the symbolic links are deleted by the `smb-storage` script, and the shares mounted in `/run/user/${UID}/gvfs` are unmounted and their mount points are removed. **If a session with mounted SMB shares terminates without unmounting the shares, the shares in `/run/user/${UID}/gvfs` will be unmounted and their mount points deleted, but the symbolic links created by `smb-storage` must be removed manually.**


The following commands are for Atlas, but Poseidon commands are similar.
### Accessing SMB shares with `smbclient`

You can connect to your Atlas and browse you personal directories with the command,
The `smbclient` program is available in both login and compute nodes. In compute nodes the only way to access SMB shares is through the client program. You can connect to your Atlas and browse you personal directories with the command,
```
$ smbclient //atlas.uni.lux/users --directory='name.surname' [email protected]
smbclient //atlas.uni.lux/users --directory='name.surname' [email protected]
```
and you can access project directories with the command
```
$ smbclient //atlas.uni.lux/name_of_your_project_shared_directory [email protected]
smbclient //atlas.uni.lux/name_of_your_project_shared_directory [email protected]
```
given that you have the rights to access the root of the project directory.

Expand All @@ -308,20 +369,20 @@ The patterns used in `mget`/`mput` are either normal file names, or globular exp

Connecting into an interactive SAMBA session means that you will have to maintain a shell session dedicated to SAMBA. However, it saves you from entering your password for every operation. If you would like to perform a single operation and exit, you can avoid the interactive session with the `--command` flag. For instance,
```
$ smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='get "full path/to/remote file.txt" "full path/to/local file.txt"'
smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='get "full path/to/remote file.txt" "full path/to/local file.txt"'
```
copies a file from the SAMBA directory to the local machine. Notice the use of double quotes to handle file names with spaces. Similarly,
```
$ smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='put "full path/to/local file.txt" "full path/to/remote file.txt"'
smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='put "full path/to/local file.txt" "full path/to/remote file.txt"'
```

Moving whole directories is a bit more involved, as it requires setting some state variables for the session, both for interactive and non-interactive sessions. To download a directory for instance, use
```bash
$ smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='recurse ON; prompt OFF; mget "full path/to/remote directory" "full path/to/local directory"'
smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='recurse ON; prompt OFF; mget "full path/to/remote directory" "full path/to/local directory"'
```
and to upload a directory use
```bash
$ smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='recurse ON; prompt OFF; mput "full path/to/remote local" "full path/to/remote directory"'
smbclient //atlas.uni.lux/users --directory='name.surname' [email protected] --command='recurse ON; prompt OFF; mput "full path/to/remote local" "full path/to/remote directory"'
```
respectively. The session option

Expand Down
9 changes: 2 additions & 7 deletions docs/filesystems/isilon.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,9 @@ OneFS, A global _low_-performance [Dell/EMC Isilon](https://www.dellemc.com/en-u

<!--intro-end-->

In 2014, the [IT Department of the University](https://wwwen.uni.lu/universite/presentation/organigrammes/organigramme_rectorat_administration_centrale/service_informatique_de_l_universite), the [UL HPC]
(https://hpc.uni.lu/about/team.html) and the [LCSB](http://wwwen.uni.lu/lcsb/) join their forces (and their funding) to acquire a scalable and modular
NAS solution able to sustain the need for an internal big data storage, _i.e._ provides space for centralized data and backups of all devices used by the UL staff and all rese
arch-related data, including the one proceed on the [UL HPC](https://hpc.uni.lu) platform.
In 2014, the [IT Department of the University](https://wwwen.uni.lu/universite/presentation/organigrammes/organigramme_rectorat_administration_centrale/service_informatique_de_l_universite), the [UL HPC](https://hpc.uni.lu/about/team.html) and the [LCSB](http://wwwen.uni.lu/lcsb/) join their forces (and their funding) to acquire a scalable and modular NAS solution able to sustain the need for an internal big data storage, _i.e._ provides space for centralized data and backups of all devices used by the UL staff and all research-related data, including the one proceed on the [UL HPC](https://hpc.uni.lu) platform.

At the end of a public call for tender released in 2014, the [EMC Isilon](http://www.emc.com/isilon) system was finally selected with an effective deployment in 2015.
It is physically hosted in the new CDC (Centre de Calcul) server room in the [Maison du Savoir](http://www.fonds-belval.lu/index.php?lang=en&page=3&sub=2).
Composed by a large number of disk enclosures featuring the [OneFS](http://www.emc.com/en-us/storage/isilon/onefs-operating-system.htm) File System, it currently offers an **effective** capacity of 3.360 PB.
At the end of a public call for tender released in 2014, the [EMC Isilon](http://www.emc.com/isilon) system was finally selected with an effective deployment in 2015. It is physically hosted in the new CDC (Centre de Calcul) server room in the [Maison du Savoir](http://www.fonds-belval.lu/index.php?lang=en&page=3&sub=2). Composed by a large number of disk enclosures featuring the [OneFS](http://www.emc.com/en-us/storage/isilon/onefs-operating-system.htm) File System, it currently offers an **effective** capacity of 3.360 PB.

A secondary Isilon cluster, acquired in 2020 and deployed in 2021 is duplicating this setup in a redundant way.

Expand Down

0 comments on commit 6ac1a6c

Please sign in to comment.