Basic checks of the production jobs
This section gives the instructions to perform some basic sanity checks of the offline production jobs for preprocessing, filtering and postprocessing.
Preprocessing
The preprocessing consists of a single slurm job. It reads the aggregated or analysis-ready frames (in general from cvmfs, and with a frame length depending of the ifo) and creates new 1s frames with h(t) timeseries reduced to 4096 Hz and applies gating (i.e. applying a filter to smoothly set h(t) to 0) when the range drops below a configurable fraction of the median range). The gating is also applied for gps times in bad data quality segments (see here for more details). The common h(t) injections produced by the rates & pop group are also superposed to the data during this step. Either 2 injection trains spaced by 12s, or 4 injections trains spaced by 6s are used.
Use checkPrepro.sh (already present in your ~/virgo/prod/scripts and in your $PATH if you followed the instructions to configure your production environment).
Usage: checkPrepro.sh [STD/SSM] [DAT/INJ] [chunk number] [prod version number] ([injection batch])
[injection batch] is either 0-2 or 1-3 or 0123
Example: checkPrepro.sh SSM DAT 17 01 | to check the SSM preprocessing version r00 on chunk 17 data
Example: checkPrepro.sh STD INJ 01 02 1-3 | to check the STD prepro r02 on chunk 11 of the 1-3 injection batchs
The script will run the following checks:
Check the presence of the slurm logifle and FdIOServer logfile.
Check all the gwf files (preprocessed data and trend data) are there.
Check the exit status of the slurm job.
Check the presence of FATAL error in the logfile.
Check in the logfile, the number of preprocessed frames. It should be equal to the official chunk duration (minus 2 frames dropped at the end of preprocessing due to the way the gating works).
Check the number of missing frames given by error messages in the logfile. When preprocessing analysis-ready frames, no missing frames are expected. When both H1 and L1 are out of observing mode, you should get an output preprocessed frame containing only the
H1:MBTA_CAT1andL1:MBTA_CAT1channels all set to 0.Check that the gps start time of the first frame and the end time of the last frame in the output gwf files correspond to the official chunk duration (minus the 2 frames dropped at the end of preprocessing).
Check the integrity of the gwf file with FrCheck with sequential or direct (TOC) access.
Check missing frames in each file. Total number of frames must be equal to the total number of preprocessed frames given in logfile
Examples of ouput:
checkPrepro.sh std dat 01 08
Some FrCheck files already exist. Do you want tu use them to save time (y/n) ? y
Running /pbs/home/m/morgan/virgo/prod/v5/checkPrepro.sh STD DAT on chunk01 run08
Preprocessing directory: /sps/virgo/USERS/mbta/O4/preprocessing/chunk01/mbtaD/v08
O4 chunk01
Start: Wed May 24 15:00:00 UTC 2023 GPS Time = 1368975618
End: Wed May 31 15:00:00 UTC 2023 GPS Time = 1369580418
Duration: 604800
Livetime (1): 543539 ( 89.9 %)
Livetime (2): 211255 ( 34.9 %)
Slurm jobid ............................... 48664472
Is slurm logfile present ? ................ OK
Is FdIOServer logfile present ? ........... OK
7 gwf files present ? ..................... OK
1 trendfile present ? ..................... OK
Slurm exit status ? ....................... OK
FATAL error in logfile ? .................. OK
Chunk duration (-2 frames) ................ 604798
Preprocessed frames ....................... 604798
Missing frames (wrt chunk duration)........ 0
Missing frames from log ................... 0
Analyzed start time .......................1368975618 OK
Analyzed end time .........................1369580416 OK (difference <=2 frames)
Checking integrity of file (sequential).... HLV-4kHz-1368975618-24382.gwf ..... OK 24382 frames 0 missing frames
Checking integrity of file (sequential).... HLV-4kHz-1369000000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (sequential).... HLV-4kHz-1369100000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (sequential).... HLV-4kHz-1369200000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (sequential).... HLV-4kHz-1369300000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (sequential).... HLV-4kHz-1369400000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (sequential).... HLV-4kHz-1369500000-80416.gwf ..... OK 80416 frames 0 missing frames
Is total nb of frames consistent with the logfiles ? .......................... OK 604798 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1368975618-24382.gwf ..... OK 24382 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1369000000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1369100000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1369200000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1369300000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1369400000-100000.gwf .... OK 100000 frames 0 missing frames
Checking integrity of file (TOC)........... HLV-4kHz-1369500000-80416.gwf ..... OK 80416 frames 0 missing frames
Is total nb of frames consistent with the logfiles ? .......................... OK 604798 frames 0 missing frames
Filtering
The matched filtering to all the dual bands templates of the bank is performed by 42 parallel jobs ordered by decreasing template duration. Job 00 processes the longest templates, Job 41 the shortest one. Job 90 and Job 91 perform the matched filtering of the single band templates. All these jobs use the gated preprocessed data.
For some templates with a risk of signal triggering the gating, a special Job R4 matched filtering job is run using ungated preprocessed data. The triggers in this jobs double count with those of the other jobs.
A clustering is also performed during this step for each filtering job. All the triggers with a time separation less than 100 ms (in the current configuration) are clustered. Full info is only kept for the highest SNR triggers of the clusters and a reduced set of variables (configurable) is kept for the other triggers.
To check the output of these 45 filtering jobs, use checkProd.sh (already present in your ~/virgo/prod/scripts and in your $PATH if you followed the instructions to configure your production environment). Reminder: the filtering jobs read the 1s preprocessed frames in input and write 100s filtered frames in output.
Usage: checkProd.sh [STD/SSM] [DAT/INJ] [chunk number] [prod version number] ([injection batch])
[injection batch] is either 0-2 or 1-3 or 0123
Example: checkProd.sh SSM DAT 17 01 | to check the SSM production version r00 on chunk 17 data
Example: checkProd.sh STD INJ 01 02 1-3 | to check the STD prod. v. r02 on chunk 11 of the 1-3 injection batchs
The script will run the following checks:
Check the presence of all the slurm logifles and MbtaRT logfiles.
Check all the gwf files are present.
Check the exit status of each slurm job.
Check the presence of FATAL error in the logfiles.
Check in the logfile, the number of input frames. It should be equal to the number of frames created by the preprocessing job.
Check in the logfile, the number of output frames. There is a facto ~100 reduction due to the change of frame duration from 1s to 100s.
Check the integrity of the gwf file with FrCheck with sequential or direct (TOC) access.
Check number of frames in each gwf file. It must be equal to the number of output frames given in the logfile.
Check that the gps start time of the first frame and the end time of the last frame in each gwf files correspond to the official chunk duration. The difference has to be less than one output frame duration (i.e. 100s).
The number of HL coincidences and of single H or L triggers are given to detect an anormaly low or high rate in obe job (no automatic check is applied). The lower rate in
job R4is expected given the lower number of templates and higher masses required.
Examples of ouput:
Running /pbs/home/m/morgan/virgo/prod/v5/checkProd.sh STD INJ on chunk01 run01
Production directory: /sps/virgo/USERS/mbta/O4/results/chunk01/mbtaI/run01/batchs-1-3/
Jobs to be checked: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 90 91 R4
----mbta id
| ----slurm jobid
| | ----presence of slurm logfile
| | | ----presence of MbtaRT logfile
| | | | ----presence of gwf file
| | | | | ----slurm exit status
| | | | | | ----FATAL error
| | | | | | | ----Nb of input frames (in log)
| | | | | | | | ----Nb of output frames (in log)
| | | | | | | | | ----FrCheck (sequential)
| | | | | | | | | | ----FrCheck (TOC)
| | | | | | | | | | | ----analyzed times
| | | | | | | | | | | | ----nb of HL triggers
| | | | | | | | | | | | | ----nb of H1 triggers
| | | | | | | | | | | | | | ----nb of L1 triggers
| | | | | | | | | | | | | | |
JOB ID OUT LOG GWF EXI FAT F-IN F-OUT SEQ TOC TIM HL H L
00 47230072 OK OK OK OK OK 604767 6049 OK OK OK 4242 589 4360
01 47230074 OK OK OK OK OK 604767 6049 OK OK OK 4116 532 3909
02 47230075 OK OK OK OK OK 604767 6049 OK OK OK 4800 982 3853
03 47230077 OK OK OK OK OK 604767 6049 OK OK OK 3987 774 3076
04 47230078 OK OK OK OK OK 604767 6049 OK OK OK 3926 689 3042
05 47230080 OK OK OK OK OK 604767 6049 OK OK OK 4048 774 3221
06 47230081 OK OK OK OK OK 604767 6049 OK OK OK 4274 798 3298
07 47230083 OK OK OK OK OK 604767 6049 OK OK OK 3297 429 2494
08 48211225 OK OK OK OK OK 604767 6049 OK OK OK 3401 444 2720
09 47230089 OK OK OK OK OK 604767 6049 OK OK OK 3320 417 2790
10 47230090 OK OK OK OK OK 604767 6049 OK OK OK 3471 483 2739
11 47230092 OK OK OK OK OK 604767 6049 OK OK OK 3464 470 2793
12 47230093 OK OK OK OK OK 604767 6049 OK OK OK 3414 448 2686
13 47230095 OK OK OK OK OK 604767 6049 OK OK OK 3557 480 2800
14 47230096 OK OK OK OK OK 604767 6049 OK OK OK 3834 545 2934
15 47230098 OK OK OK OK OK 604767 6049 OK OK OK 3881 591 3190
16 47230099 OK OK OK OK OK 604767 6049 OK OK OK 4064 594 3482
17 47230102 OK OK OK OK OK 604767 6049 OK OK OK 4196 626 3639
18 47230103 OK OK OK OK OK 604767 6049 OK OK OK 3811 586 3200
19 47230105 OK OK OK OK OK 604767 6049 OK OK OK 3814 533 3333
20 47230106 OK OK OK OK OK 604767 6049 OK OK OK 3294 535 2591
21 47230106 OK OK OK OK OK 604767 6049 OK OK OK 3396 483 2879
22 47230105 OK OK OK OK OK 604767 6049 OK OK OK 3687 546 3440
23 47230103 OK OK OK OK OK 604767 6049 OK OK OK 3534 540 3431
24 47230102 OK OK OK OK OK 604767 6049 OK OK OK 3585 593 3255
25 47230099 OK OK OK OK OK 604767 6049 OK OK OK 3342 558 3204
26 47230098 OK OK OK OK OK 604767 6049 OK OK OK 3475 640 3175
27 47230096 OK OK OK OK OK 604767 6049 OK OK OK 3377 621 3165
28 47230095 OK OK OK OK OK 604767 6049 OK OK OK 3204 627 3194
29 47230093 OK OK OK OK OK 604767 6049 OK OK OK 3025 580 3016
30 47230092 OK OK OK OK OK 604767 6049 OK OK OK 2989 602 3124
31 47230090 OK OK OK OK OK 604767 6049 OK OK OK 3057 594 3307
32 47230089 OK OK OK OK OK 604767 6049 OK OK OK 2945 566 3369
33 48211225 OK OK OK OK OK 604767 6049 OK OK OK 3026 606 3754
34 47230083 OK OK OK OK OK 604767 6049 OK OK OK 3118 639 3778
35 47230081 OK OK OK OK OK 604767 6049 OK OK OK 2882 551 3525
36 47230080 OK OK OK OK OK 604767 6049 OK OK OK 2905 615 3682
37 47230078 OK OK OK OK OK 604767 6049 OK OK OK 2955 664 3935
38 47230077 OK OK OK OK OK 604767 6049 OK OK OK 3198 821 4421
39 47230075 OK OK OK OK OK 604767 6049 OK OK OK 3480 914 5623
40 47230074 OK OK OK OK OK 604767 6049 OK OK OK 4325 1396 8119
41 47230072 OK OK OK OK OK 604767 6049 OK OK OK 7324 3766 14952
90 47230108 OK OK OK OK OK 604767 6049 OK OK OK 1308 1471 2806
91 47230110 OK OK OK OK OK 604767 6049 OK OK OK 5375 2874 10523
R4 47230112 OK OK OK OK OK 604767 6049 OK OK OK 370 550 1979
Postprocessing
The postprocessing (also called postfiltering) consists of a single slurm job. It reads the output of the filtering step and merges the 100s frames of the 45 filtering jobs. A second step of clustering is performed. Excess rate or SNR excess correction is applied to reduce the background. iDQ downranking can also be applied at this step (not currently the default).
Background files are produced (mbtaD_bkg-*.gwf or mbtaI_bkg-*.gwf) with all the single trigger pseudo events which will be used to derive the FAR. pAstro and FAR are applied to foreground events (reading the pAstro and FAR parameterization from files) and even files produced (mbtaD_postfiltered-*.gwf or mbtaI_postfiltered-*.gwf).
A trendfile containing channels with a reduced sampling rates is also produced (mbtaD_postfiltered_trend-*.gwf or mbtaI_postfiltered_trend-*.gwf) for quick checks.
Use checkPost.sh (already present in your ~/virgo/prod/scripts and in your $PATH if you followed the instructions to configure your production environment).
Usage: checkPost.sh [STD/SSM] [DAT/INJ] [chunk number] [filtering version] [postfiltering version] [(injection batch])
[injection batch] is either 0-2 or 1-3 or 0123
Example: checkPost.sh SSM DAT 17 00 01 | to check the v01 postprocessing of chunk 17 of the SSM DATA production r00
Example: checkPost.sh STD INJ 01 02 01 1-3 | to check the v01 postprocessing of chunk 01 of the STD injections (batchs 1 and 3) production r02
The script will run the following checks:
Check the presence of the slurm logifle and MbPostFiltering logfile.
Check all the gwf files (background, foreground and trend) are there.
Check the exit status of the slurm job.
Check the presence of FATAL error in the logfile.
Check in the logfile that all the filtering gwf file have been merged
Check in the logfile, the number of processed frames.
Check the number of missing frames given by error messages in the logfile.
Check that the gps start time of the first frame and the end time of the last frame are the same as in the filtering jobs and are compatible with the official chunk limits, i.e. less than 1 frame length difference (currently 100s).
Check the integrity of the gwf files (background, foreground and trend) with FrCheck with sequential or direct (TOC) access.
Check that the total number of frames in gwf files are equal to the total number of frames in the filtering step.
Examples of ouput:
Running /pbs/home/m/morgan/virgo/prod/v5/checkPost.sh STD DAT on chunk01 run03 for version v01 of postprocessing
O4 chunk01
Start: Wed May 24 15:00:00 UTC 2023 GPS Time = 1368975618
End: Wed May 31 15:00:00 UTC 2023 GPS Time = 1369580418
Duration: 604800
Livetime: 0
Filtering directory: /sps/virgo/USERS/mbta/O4/results/chunk01/mbtaD/run03
Filtering (100s frames)
Start: 1368975600
End: 1369580500
Duration: 604900s (6049 frames)
Postprocessing directory: /sps/virgo/USERS/mbta/O4/results/chunk01/mbtaD/run03/postfiltering/v01
Slurm jobid ............................... 48743825
Is slurm logfile present ? ................ OK
Is mbta logfile present ? ................. OK
7 fg gwf files present ? .................. OK
7 bg gwf files present ? .................. OK
1 trendfile present ? ..................... OK
Slurm exit status ? ....................... OK
FATAL error in logfile ? .................. OK
45 filtering files merged in logfile ...... OK
Each filtering file present in logfile ? .. OK
Processed frames (from log) ............... OK (604900s)
Missing frames from log ................... 0
Postprocessing start time ................. 1368975600
Is the same as filtering start time ? ..... OK
Is compatible with chunk start time ? ..... OK (difference < 100s)
Postprocessing end time ................... 1369580500
Is the same as filtering end time ? ....... OK
Is compatible with chunk end time ? ....... OK (difference < 100s)
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1368975600-24400.gwf ..... OK 244 frames
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1369000000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1369100000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1369200000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1369300000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1369400000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (sequential).... mbtaD_postfiltered-1369500000-80500.gwf ..... OK 805 frames
Is total nb of processed frames equal to nb of filtering frames ............................ OK 6049 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1368975600-24400.gwf ..... OK 244 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1369000000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1369100000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1369200000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1369300000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1369400000-100000.gwf .... OK 1000 frames
Checking integrity of fg file (TOC)........... mbtaD_postfiltered-1369500000-80500.gwf ..... OK 805 frames
Is total nb of processed frames equal to nb of filtering frames ............................ OK 6049 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1368975600-24400.gwf .............. OK 244 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1369000000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1369100000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1369200000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1369300000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1369400000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (sequential).... mbtaD_bkg-1369500000-80500.gwf .............. OK 805 frames
Is total nb of processed frames equal to nb of filtering frames ............................ OK 6049 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1368975600-24400.gwf .............. OK 244 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1369000000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1369100000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1369200000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1369300000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1369400000-100000.gwf ............. OK 1000 frames
Checking integrity of bg file (TOC)........... mbtaD_bkg-1369500000-80500.gwf .............. OK 805 frames
Is total nb of processed frames equal to nb of filtering frames ............................ OK 6049 frames
Checking trend file (sequential)........ mbtaD_postfiltered_trend-1368950000-650000.gwf .... OK 13 frames
Are time limits compatible with filtering start and end times ? ............................ OK
Checking trend file (TOC)............... mbtaD_postfiltered_trend-1368950000-650000.gwf .... OK 13 frames
Are time limits compatible with filtering start and end times ? ............................ OK
An additionnal check