Recipe for impatient user

This section presents how to run a full mbta offline production (preprocessing, filtering and postfiltering) at CCIN2P3.

Danger

Every virgo user has write autorisation on the mbta production directory. Be careful and check the paths you are using in the production scripts.

Virgo code

MBTA and its dependencies are centrally installed in

/pbs/throng/virgo/virgoApp

See this Quick software overview, and if needed it is possible to Install mbta at ccin2p3 (optional).

O4 mbta production directories

The O4 offline production is run in

/sps/virgo/USERS/mbta/O4

The organisation of the folders is the following:

HLVoffline:      Inputs h(t) data. ffl files are pointing to either local copy of gwf files
                 or to gwf files on cvmfs. The ffl files must be regularly updated.

cat-flags:       List of good quality segments to be used for analysis.

idq-offline:     iDQ offline timeseries (Analysis Ready frames contain online iDQ time series).

rpo4-offline:    FrSimEvent and naked h(t) of Reed's injections.

preprocessing:   Preprocessed data or injections. In case of injections, superpose the naked h(t) injection signal to the data.
                 Reduction to 4096 Hz / 1s frame, apply gating and cat1 vetoes,
                 add iDQ offline time series.
                 The folder is organised as
                 chunkXX / mbtaD / vYY
                           mbtaI
                            ssmD
                            ssmI

banks:           Template banks for mbta filtering.

far_pastro_main: Files used for pastro and far assignation at postfilering step.

results:         Outputs of mbta filtering and postfiltering. The folder is organised as
                 chunkXX / mbtaD / runYY
                           mbtaI
                            ssmD
                            ssmI

Generally, when different runs or versions are present in a folder, a README file gives more details on the differents versions available.

Install the production scripts

mkdir -P ~/virgo
cd ~/virgo
git clone git@git.ligo.org:morgan.lethuillier/mbta-offline-prod.git prod

You may need to load you ssh key before with

eval $(ssh-agent)
ssh-add $HOME/.ssh/your_prefered_private_key

See this doc for more guidance on how to generate your ssh key and add it to your git account.

Three directories should have been created:

~/virgo/prod/scripts : some very simple scripts usefull to manage the production
~/virgo/prod/v5 : main production script
~/virgo/prod/v5/cfg.O4 : mbta skeleton config files

Add the following lines to your .bashrc to add the scripts to your $PATH:

# ------ functions to add or remove a directory to/from the $PATH and avoid duplication
pathadd() {
   newdir=${1%/}
   if [ -d "$1" ] && ! echo $PATH | grep -E -q "(^|:)$newdir($|:)" ; then
      if [ "$2" = "after" ] ; then
         PATH="$PATH:$newdir"
      else
         PATH="$newdir:$PATH"
      fi
   fi
}

pathrm() {
   PATH="$(echo $PATH | sed -e "s;\(^\|:\)${1%/}\(:\|\$\);\1\2;g" -e 's;^:\|:$;;g' -e 's;::;:;g')"
}

# ------ add scripts directory to $PATH
pathadd "."
pathadd "${HOME}/virgo/prod/scripts" after
export PATH

Configure your environment

See here for more details on setmbta.sh

source setmbta.sh [LOCAL/CENTRAL]  [MBTA_VERSION]
# example: source setmbta.sh CENTRAL v5r28
# example: source setmbta.sh CENTRAL last

CENTRAL is for a version installed centrally (in /pbs/throng/virgo/virgoApp/).

LOCAL is for a version you have installed locally (in general in ~/virgo/App).

MBTA_VERSION is the name of the mbta version. last corresponds to the more recent version.

Update the chunk definition

If needed, update the chunk definition with createChunkProperties.sh. Get the latest official gps time definition of the chunk from:

https://git.ligo.org/cbc-allsky-searches/chunk-definitions/-/blob/main/o4-chunks.txt

Then

createChunkProperties.sh o4-chunks.txt

Two files will be produced. ChunkPropertiesO4.sh containing the chunk definition to be used by the production script, and a more human readable ChunkPropertiesO4.txt like this:

chunk     start          end  duration  livetime            Start Date              End Date
 1368241218   1369497618   1256400         0   2023-05-16_03:00:00   2023-05-30_16:00:00
 1368975618   1370097052   1121434   1016508   2023-05-24_15:00:00   2023-06-06_14:30:34
 1370097052   1371306087   1209035   1053810   2023-06-06_14:30:34   2023-06-20_14:21:09
 1371306087   1372546081   1239994   1084877   2023-06-20_14:21:09   2023-07-04_22:47:43
 1372546081   1373711624   1165543    999904   2023-07-04_22:47:43   2023-07-18_10:33:26
 1373711624   1374936316   1224692    969644   2023-07-18_10:33:26   2023-08-01_14:44:58
 1374936316   1376144770   1208454    965380   2023-08-01_14:44:58   2023-08-15_14:25:52
 1376144770   1377355516   1210746   1097777   2023-08-15_14:25:52   2023-08-29_14:44:58
 1377355516   1378565116   1209600   1087781   2023-08-29_14:44:58   2023-09-12_14:44:58
 1378565116   1379774716   1209600   1086368   2023-09-12_14:44:58   2023-09-26_14:44:58
 1379774716   1380984316   1209600   1071156   2023-09-26_14:44:58   2023-10-10_14:44:58

Preprocessing

In ~/virgo/prod/scripts you will find the scripts to run the preprocessing (createPrepro.sh), the filtering step (createProd.sh) and the postprocessing (createPost.sh).

These scripts will create directories with all the configuration files needed to run a production, based on the skeleton config files found in ~virgo/prod/v5/cfg.O4 or in git.

You may want to change the configuration in the skeleton files to fit your particular needs. The description of the different keys used in the config files are given in the mbta doc.

1- Modify the script:

In the first lines of createPrepro.sh, you can chose whether you will run on local h(t) data (for instance in /sps/virgo/USERS/mbta/O4/HLVoffline/local/analysis-ready) or data distributed over stashcache via cvmfs, the type of frames (aggregated frames or analysis ready frames), the root directory where your production dir will be created, and the version of the Fd library:

# Where are the frames ? LOCAL disk or on CVMFS ?
#FRAME_DIR="LOCAL"
FRAME_DIR="CVMFS"

# Type of frames ? AGGREGATED or ANALYSIS-READY ?
#FRAME_TYPE="AGGREGATED"
FRAME_TYPE="ANALYSIS-READY"

# Production directory
PREDIR="/sps/virgo/USERS/$USER/mbta/O4"         # Your production directory
#PREDIR="/sps/virgo/USERS/mbta/O4"              # "Official" production directory - Take care of not deleting an existing prod

# FdIOServer version
FD_VER="/pbs/throng/virgo/virgoApp/Fd/v8r62p1"  # Fd central version
#FD_VER="${HOME}/virgo/App/Fd/v8r59"            # Fd local version

2- Run the script:

Then run the script, with type of analysis (standard ou subsolar), the type of data (data or injections), the chunk number, a version number to identify your preprocessing, and in case of injections the injection trains you want to analyze (0-2 corresponds trains 0 and 2 spaced by 12s and 1-3 to trains 1 and 3 spaced by 12s).

createPrepro.sh
usage:   createPrepro.sh [STD/SSM] [DAT/INJ] [chunk number] [prepro version number] ([injection batch])
      [injection batch] is either 0-2 or 1-3 or 0123
example: createPrepro.sh SSM DAT 17 03      | new preprocessing v03 of data of chunk 17 for ssm analysis
example: createPrepro.sh STD INJ 17 03 1-3  | new preprocessing v03 of injection batchs 1 and 3 of chunk 17 for std analysis

A new directory will be created. For instance createPrepro.sh std dat 01 50 will create the following directory: /sps/virgo/USERS/$USER/mbta/O4/preprocessing/chunk01/mbtaD/v50/

3- Launch the preprocessing on slurm:

In this new directory, if you opted for local frames files (FRAME_DIR="LOCAL"), you should have two files: prepro.cfg containing the mbta configuration for the preprocessing and prepro.submit the script to be submitted to the slurm batch farm. Submit the job with:

sbatch submit.sh > submit.id

submit.id contains the slurm jobid and will be used for the checks after the job.

If you are not familiar with slurm, you can get some more infos in the ccin2p3 documentation pages.

If you opted for distant frames files hosted on stashcache (FRAME_DIR="CVMFS"),three additional files are present: prepro.sh, setup.sh and simple-token-refresh.sh to refresh every 3 hours the LVK identification token (scitoken) that grants you access to these distant private data. Before submitting your job, you should execute every lines of setup.sh in your terminal to create a first scitoken. See ligo.org authentification for more details about these tokens. Once you have access to the distant data, you can submit your slurm job:

sbatch submit.sh > submit.id

Filtering

1- Modify the script:

In the first lines of createProd.sh, you can chose the root directory where your production dir will be created, the number of parallel jobs used to run on the full template bank, the version of mbta to be used, and some configuration keys for iDQ. iDQ info will be added to the event like a reweighted SNR but no selection is done at this step. iDQ info and selection on a iDQ reweighted (combined) ranking statistics can be done at the postprocessing step.

#------Main parameters------------------------------------------------
#
PRODIR="/sps/virgo/USERS/$USER/mbta/O4"  # Your production directory
#PRODIR="/sps/virgo/USERS/mbta/O4"       # "Official" production directory - Take care of not deleting an existing prod
NJOBS_STD=42                             # Nb of jobs for the splitting of the std template bank
NJOBS_SSM=100                            # Nb of jobs for the splitting of the ssm template bank
NJOBS_SINGLE=3                           # Nb of jobs for the splitting of the std single band template bank
CFGDIR="cfg.O4/AR_Frames"                # Directory containing the skeleton of mbta config files
#CFGDIR="cfg.O4/Aggregated_Frames"       # Directory containing the skeleton of mbta config files
#MBTA_VER="local v5r40p1"                # Version of mbta to be used (see setmbta.sh for more informations)
MBTA_VER="central last"                  # Version of mbta to be used (see setmbta.sh for more informations)
IDQ_VAR=30                               # idq variable: 10 = mean, 20 = mean excluding 0 values, 30 = max;  over defined duration
IDQ_TIME_AFTER=-1                        # time before the event to calculate idq var. If -1, take template duration
IDQ_TIME_BEFORE=-1                       # time after  the event to calculate idq var. If -1, take ringdown duration
IDQ_ALPHA=1.5                            # reranked snr idqSNR**2 = rwSNR**2 - alpha * IDQ_VAR
                                         # (variable is kept in the event but no threshold on this reranked snr is applied at filtering step)


#------iDQ channels-----------------------------------
#
# iDQ infos will be added to the event only if a channel is defined here
IDQ_CHANNEL_V1=""
IDQ_CHANNEL_L1="L1:IDQ-LOGLIKE_OVL_10_2048"
IDQ_CHANNEL_H1="H1:IDQ-LOGLIKE_OVL_10_2048"

2- Run the script:

Then run the script, with type of analysis (standard ou subsolar), the type of data (data or injections), the chunk number, the version of the preprocessed data to be used, a version number to identify your filtering, and in case of injections the injection trains you want to analyze (0-2 corresponds trains 0 and 2 spaced by 12s and 1-3 to trains 1 and 3 spaced by 12s).

usage:   createProd.sh [STD/SSM]  [DAT/INJ]  [chunk number] [prepro version] [prod version] ([injection batch])"
          [injection batch] is either 0-2 or 1-3 or 0123
example: createProd.sh SSM DAT 17 02 01      | to create a ssm analysis filtering prod version run01 to be run on chunk 17 preprocessed data version v02"
example: createProd.sh STD INJ 33 01 03 1-3  | to create a std analysis filtering prod version run03 to be run on chunk 33 preprocessed version 01 of injections batchs 1 and 3"

A new directory will be created. For instance createProd.sh std inj 01 03 07 0-2 will create the following directory: /sps/virgo/USERS/$USER/mbta/O4/results/chunk01/mbtaI/run07/batchs-0-2 with the config files and scripts needed to filter the preprocessed data from /sps/virgo/USERS/$USER/mbta/O4/preprocessing/chunk01/mbtaI/v03/batchs-0-2

3- Launch the filtering on slurm:

In this new directory, you should have all the config files *.cfg and slurm submit files *.slurm to run the filtering on each part of the template bank. Eight threads will be used to filter each part of the bank. You could either run these 8 threads on 8 cpu cores (1 filtering process per slurm job), or run 16 threads on 8 cpu cores (2 filtering processes per job). This second possibility is used in production. It results in a longer runtime of the jobs but is more economical in cpu usage. To submit all the filtering jobs in this case:

source sub_mbtaI_EPJ2.sh > sub_mbtaI_EPJ2.id

sub_mbtaI_EPJ2.id contains the slurm jobid and will be used for the checks after the job.

To chose the first possibility at a cost of a higher cpu usage:

source sub_mbtaI_EPJ1.sh > sub_mbtaI_EPJ1.id

Postprocessing

1- Modify the script:

During the postprocessing, all the filtering job outputs (*.gwf files) will be merged, a second clustering will be performed, and a noise reduction applied (either excess rate or snr excess). In the first lines of createPost.sh, you can chose the root directory where your postprocessing directory will be created, the directory containing the skeleton config files, the version of mbta to be used, the far model to be applied to the events and some configuration keys for iDQ. If a iDQ channel name is specified, the corresponding snr and combined ranking statistics will be reweighted using iDQ info.

PRODIR="/sps/virgo/USERS/$USER/mbta/O4"  # Your production directory
#PRODIR="/sps/virgo/USERS/mbta/O4"       # "Official" production directory - Take care of not deleting an existing prod
CFGDIR="cfg.O4/AR_Frames"                # Directory containing the skeleton of mbta config files
MBTA_VER="central v5r44p3"               # Version of mbta to be used (see setmbta.sh for more informations)
#MBTA_VER="central last"                 # Version of mbta to be used (see setmbta.sh for more informations)
IDQ_VAR=30                               # idq variable: 10 = mean, 20 = mean excluding 0 values, 30 = max;  over defined duration
IDQ_TIME_AFTER=-1                        # time before the event to calculate idq var. If -1, take template duration
IDQ_TIME_BEFORE=-1                       # time after  the event to calculate idq var. If -1, take ringdown duration
IDQ_ALPHA=1.5                            # reranked snr idqSNR**2 = rwSNR**2 - alpha * IDQ_VAR
                                         # (variable is kept in the event but no threshold on this reranked snr is applied at filtering step)

 # Reranking of the SNR will be done if a channel is defined here
IDQ_CHANNEL_V1=""
IDQ_CHANNEL_L1=""
IDQ_CHANNEL_H1=""
#IDQ_CHANNEL_L1="L1:IDQ-LOGLIKE_OVL_10_2048"
#IDQ_CHANNEL_H1="H1:IDQ-LOGLIKE_OVL_10_2048"

# Superchunk version to be used to set FAR in STD search
STD_FARFILE="/sps/virgo/USERS/mbta/O4/results/MYSC/mbtaD/run07v01_v5r44p3_9000/FARofPastro.gwf"

# Superchunk version to be used to set FAR in SSM search
SSM_FARFILE="/sps/virgo/USERS/mbta/O4/results/MYSC/ssmD/run03v03_v5r47p2/FARofCRS_with_clustering

2- Run the script:

Then run the script, with type of analysis (standard ou subsolar), the type of data (data or injections), the chunk number, the version of filtered data to be used, a version number to identify your postprocessing, and in case of injections the injection trains you want to analyze (0-2 corresponds trains 0 and 2 spaced by 12s and 1-3 to trains 1 and 3 spaced by 12s).

Usage:   createPost.sh [STD/SSM]  [DAT/INJ]  [chunk number] [filtering version] [postfiltering version] [injection batch]
         [injection batch] is either 0-2 or 1-3 or 0123
Example: createPost.sh SSM DAT 17 00 01     | to run an new v01 postprocessing of chunk 17 of the SSM DATA production r00
Example: createPost.sh STD INJ 01 02 01 1-3 | to run an new v01 postprocessing of chunk 01 of the STD injections (batchs 1 and 3) production r02

A new directory will be created. For instance createPost.sh std inj 01 07 01 0-2 will create the following directory: /sps/virgo/USERS/$USER/mbta/O4/results/chunk01/mbtaI/run07/batchs-0-2/postfiltering/v01 with the config files and scripts needed to filter the preprocessed data from /sps/virgo/USERS/$USER/mbta/O4/preprocessing/chunk01/mbtaI/v03/batchs-0-2

3- Launch the postprocessing on slurm:

In this new directory, you should have a config files *_postfiltering.cfg and a slurm submit file *_postfiltering.slurm to run the postprocessing. This single job needs only one core but the need in memory can be high (48 GB or higher). To submit the postprocessing job (replace mbtaI by mbtaD, ssmI or ssmD to suit your case):

sbatch mbtaI_postfiltering.slurm > mbtaI_postfiltering.id

mbtaI_postfiltering.id contains the slurm jobid and will be used for the checks after the job.