Snakemake

run rule locally

localrules: foo

Can be called multiple times in Snakefile.

Using lambda functions inside rules

Listed below are some of my examples that used lambda for conditional situations.

  1. Uses lambda to define input file needed depending on wildcard 'data_source'
rule extract_by_chr:
    input:
        lambda wildcards: str(EXAC_RAW_PATH / "ExAC.r1.sites.vep.vcf.gz") \
            if (wildcards.data_source.lower()=='exac') \
            else \
            str(GNOMAD_RAW_PATH / "exomes"/ "gnomad.exomes.r2.0.1.sites.vcf.gz")
    output:
        "{path}/split_by_chr/{data_source}_raw_chr_{chr_no}.vcf.gz"
    shell:
        "..."
  1. Uses lambda to choose the parameter flag required
rule extract_Adjusted_AC_AN_exac:
    output:
        str("extract_AC_AN/{filter_used}" / "11_table.tsv.gz")
    params:
        filter_parameter = lambda wildcards: '--remove-filtered-all' if wildcards.filter_used=='pass'  else ''
    shell:
        "..."

Using dictionary with wildcards as keys

input:
    bams = lambda wildcards: dict_name[wildcards.sample]

zip with >2 wildcards

Use multiple expand. Example:

expand(expand(
    ANALYSIS + "/{sample_g}_vs_{sample_t}/Stelka/results/variants/somatic.{{typevar}}_Filtered",
            zip, sample_g=GERMLINE, sample_t=TUMOR),
            typevar=TYPEVAR)

Source

Using bash script that sends its own job in a snakemake rule

Let's say script.sh sends its own job to a cluster. To use this script in a snakemake rule, use -K flag with bsub in that shell script and then use wait command in snakemake rule.

rule xxx:
    shell:
        """
        ./script.sh  &
        wait
        """

Sourcing .bashrc as part of shell command in snakemake rule

set +u; source /path/to/.bashrc; set -u

Source.

--dag and custom print messages

If custom print messages are used in snakemake pipeline, --dag visualization will run into error. To make them play nice, use dot's commenting in print messages.

According to dot manual,

Comments may be /C-like/ or //C++-like.

Example:

print (f'// yo yo yo: "{x}"')

Snakemake hangs when job times out or cancelled

This is a problem as reported here. I'm copying my answer from that site to solve this issue:

Snakemake doesn't recognize all kinds of job statuses in slurm (and also in other job schedulers). To bridge this gap, snakemake provides option --cluster-status, where custom python script can be provided. As per snakemake's documentation:

 --cluster-status

Status command for cluster execution. This is only considered in combination with the –cluster flag.
If provided, Snakemake will use the status command to determine if a job has finished successfully or failed.
For this it is necessary that the submit command provided to –cluster returns the cluster job id.
Then, the status command will be invoked with the job id.
Snakemake expects it to return ‘success’ if the job was successfull, ‘failed’ if the job failed and ‘running’ if the job still runs.

Example shown in snakemake's doc to use this feature:

#!/usr/bin/env python
import subprocess
import sys

jobid = sys.argv[1]

output = str(subprocess.check_output("sacct -j %s --format State --noheader | head -1 | awk '{print $1}'" % jobid, shell=True).strip())

running_status=["PENDING", "CONFIGURING", "COMPLETING", "RUNNING", "SUSPENDED"]
if "COMPLETED" in output:
  print("success")
elif any(r in output for r in running_status):
  print("running")
else:
  print("failed")

To use this script call snakemake similar to below, where status.py is the script above.

$ snakemake all --cluster "sbatch --cpus-per-task=1 --parsable" --cluster-status ./status.py

Alternatively, you may use premade custom scripts for several job schedulers (slurm, lsf, etc), available via Snakemake-Profiles. Here is the one for slurm - slurm-status.py.

Using snakemake profile

Snakemake profiles make it easy to always use certain flags and options. Various premade Snakemake-Profiles have been made available by the community/authors. For slurm, I use my own forked repo - https://github.com/ManavalanG/slurm.

!!! note When setting up, for submit_script, choose slurm-submit-advanced.py as this allows the usage of --cluster-config option.

Job logs in append mode

Use --open-mode=append with sbatch.

From sbatch doc:

--open-mode=append|truncate

        Open the output and error files using append or truncate mode as specified. The default value is specified by the system configuration parameter JobFileAppend.