Package looper Documentation

Class Project

Looper-specific NGS Project.

Parameters:

  • config_file (str): path to configuration file with data fromwhich Project is to be built
  • subproject (str): name indicating subproject to use, optional
def build_submission_bundles(self, protocol, priority=True)

Create pipelines to submit for each sample of a particular protocol.

With the argument (flag) to the priority parameter, there's control over whether to submit pipeline(s) from only one of the project's known pipeline locations with a match for the protocol, or whether to submit pipelines created from all locations with a match for the protocol.

Parameters:

  • protocol (str): name of the protocol/library for which tocreate pipeline(s)
  • priority (bool): to only submit pipeline(s) from the first of thepipelines location(s) (indicated in the project config file) that has a match for the given protocol; optional, default True

Returns:

  • Iterable[(PipelineInterface, type, str, str)]:

Raises:

  • AssertionError: if there's a failure in the attempt topartition an interface's pipeline scripts into disjoint subsets of those already mapped and those not yet mapped
def constants(self)

Return key-value pairs of pan-Sample constants for this Project.

Returns:

  • Mapping: collection of KV pairs, each representing a pairingof attribute name and attribute value
def derived_columns(self)

Collection of sample attributes for which value of each is derived from elsewhere

Returns:

  • list[str]: sample attribute names for which value is derived
def get_interfaces(self, protocol)

Get the pipeline interfaces associated with the given protocol.

Parameters:

  • protocol (str): name of the protocol for which to get interfaces

Returns:

  • Iterable[looper.PipelineInterface]: collection of pipelineinterfaces associated with the given protocol

Raises:

  • KeyError: if the given protocol is not (perhaps yet) mappedto any pipeline interface
def get_outputs(self, skip_sample_less=True)

Map pipeline identifier to collection of output specifications.

This method leverages knowledge of two collections of different kinds of entities that meet in the manifestation of a Project. The first is a collection of samples, which is known even in peppy.Project. The second is a mapping from protocol/assay/library strategy to a collection of pipeline interfaces, in which kinds of output may be declared. Knowledge of these two items is here harnessed to map the identifier for each pipeline about which this Project is aware to a collection of pairs of identifier for a kind of output and the collection of this Project's samples for which it's applicable (i.e., those samples with protocol that maps to the corresponding pipeline).

Parameters:

  • skip_sample_less (bool): whether to omit pipelines that are forprotocols of which the Project has no Sample instances

Returns:

  • Mapping[str, Mapping[str, namedtuple]]: collection of bindingsbetween identifier for pipeline and collection of bindings between name for a kind of output and pair in which first component is a path template and the second component is a collection of sample names

Raises:

  • TypeError: if argument to sample-less pipeline skipping parameteris not a Boolean
def implied_columns(self)

Collection of sample attributes for which value of each is implied by other(s)

Returns:

  • list[str]: sample attribute names for which value is implied by other(s)
def num_samples(self)

Count the number of samples available in this Project.

Returns:

  • int: number of samples available in this Project.
def output_dir(self)

Directory in which to place results and submissions folders.

By default, assume that the project's configuration file specifies an output directory, and that this is therefore available within the project metadata. If that assumption does not hold, though, consider the folder in which the project configuration file lives to be the project's output directory.

Returns:

  • str: path to the project's output directory, either asspecified in the configuration file or the folder that contains the project's configuration file.

Raises:

  • Exception: if this property is requested on a project thatwas not created from a config file and lacks output folder declaration in its metadata section
def project_folders(self)

Critical project folder keys

def protocols(self)

Determine this Project's unique protocol names.

Returns:

  • Set[str]: collection of this Project's unique protocol names
def required_metadata(self)

Which metadata attributes are required.

def results_folder(self)
def sample_annotation(self)

Get the path to the project's sample annotations sheet.

Returns:

  • str: path to the project's sample annotations sheet
def sample_names(self)

Names of samples of which this Project is aware.

def sample_subannotation(self)

Return the data table that stores metadata for subsamples/units.

Returns:

  • pandas.core.frame.DataFrame | NoneType: table ofsubsamples/units metadata
def sample_table(self)

Return (possibly first parsing/building) the table of samples.

Returns:

  • pandas.core.frame.DataFrame | NoneType: table of samples'metadata, if one is defined
def samples(self)

Generic/base Sample instance for each of this Project's samples.

Returns:

  • Iterable[Sample]: Sample instance for eachof this Project's samples
def sheet(self)

Annotations/metadata sheet describing this Project's samples.

Returns:

  • pandas.core.frame.DataFrame: table of samples in this Project
def submission_folder(self)
def subproject(self)

Return currently active subproject or None if none was activated

Returns:

  • str: name of currently active subproject
def subsample_table(self)

Return (possibly first parsing/building) the table of subsamples.

Returns:

  • pandas.core.frame.DataFrame | NoneType: table of subsamples'metadata, if the project defines such a table
def templates_folder(self)

Path to folder with default submission templates.

Returns:

  • str: path to folder with default submission templates

Class MissingMetadataException

Project needs certain metadata.

Class MissingSampleSheetError

Represent case in which sample sheet is specified but nonexistent.

Class PipelineInterface

This class parses, holds, and returns information for a yaml file that specifies how to interact with each individual pipeline. This includes both resources to request for cluster job submission, as well as arguments to be passed from the sample annotation metadata to the pipeline

Parameters:

  • config (str | Mapping): path to file from which to parseconfiguration data, or pre-parsed configuration data.
def choose_resource_package(self, pipeline_name, file_size)

Select resource bundle for given input file size to given pipeline.

Parameters:

  • pipeline_name (str): Name of pipeline.
  • file_size (float): Size of input data (in gigabytes).

Returns:

  • MutableMapping: resource bundle appropriate for given pipeline,for given input file size

Raises:

  • ValueError: if indicated file size is negative, or if thefile size value specified for any resource package is negative
  • _InvalidResourceSpecificationException: if no defaultresource package specification is provided
def copy(self)

Copy self to a new object.

def fetch_pipelines(self, protocol)

Fetch the mapping for a particular protocol, null if unmapped.

Parameters:

  • protocol (str): name/key for the protocol for which to fetch thepipeline(s)

Returns:

  • str | Iterable[str] | NoneType: pipeline(s) to which the givenprotocol is mapped, otherwise null
def fetch_sample_subtype(self, protocol, strict_pipe_key, full_pipe_path)

Determine the interface and Sample subtype for a protocol and pipeline.

Parameters:

  • protocol (str): name of the relevant protocol
  • strict_pipe_key (str): key for specific pipeline in a pipelineinterface mapping declaration; this must exactly match a key in the PipelineInterface (or the Mapping that represent it)
  • full_pipe_path (str): (absolute, expanded) path to thepipeline script

Returns:

  • type: Sample subtype to use for jobs for the given protocol,that use the pipeline indicated

Raises:

  • KeyError: if given a pipeline key that's not mapped in thepipelines section of this PipelineInterface
def finalize_pipeline_key_and_paths(self, pipeline_key)

Determine pipeline's full path, arguments, and strict key.

This handles multiple ways in which to refer to a pipeline (by key) within the mapping that contains the data that defines a PipelineInterface. It also ensures proper handling of the path to the pipeline (i.e., ensuring that it's absolute), and that the text for the arguments are appropriately dealt parsed and passed.

Parameters:

  • pipeline_key (str): the key in the pipeline interface file usedfor the protocol_mappings section. Previously was the script name.

Returns:

  • (str, str, str): more precise version of input key, along withabsolute path for pipeline script, and full script path + options
def get_arg_string(self, pipeline_name, sample, submission_folder_path='', **null_replacements)

For a given pipeline and sample, return the argument string.

Parameters:

  • pipeline_name (str): Name of pipeline.
  • sample (Sample): current sample for which job is being built
  • submission_folder_path (str): path to folder in which filesrelated to submission of this sample will be placed.
  • null_replacements (dict): mapping from name of Sample attributename to value to use in arg string if Sample attribute's value is null

Returns:

  • str: command-line argument string for pipeline
def get_attribute(self, pipeline_name, attribute_key, path_as_list=True)

Return the value of the named attribute for the pipeline indicated.

Parameters:

  • pipeline_name (str): name of the pipeline of interest
  • attribute_key (str): name of the pipeline attribute of interest
  • path_as_list (bool): whether to ensure that a string attributeis returned as a list; this is useful for safe iteration over the returned value.
def get_pipeline_name(self, pipeline)

Translate a pipeline name (e.g., stripping file extension).

Parameters:

  • pipeline (str): Pipeline name or script (top-level key inpipeline interface mapping).

Returns:

  • str: translated pipeline name, as specified in config or bystripping the pipeline's file extension
def iterpipes(self)

Iterate over pairs of pipeline key and interface data.

Returns:

  • iterator of (str, Mapping): Iterator over pairs of pipelinekey and interface data
def missing_requirements(self, pipeline)

Determine which requirements--if any--declared by a pipeline are unmet.

Parameters:

  • pipeline (str): key for pipeline for which to determine unmet reqs

Returns:

  • Iterable[looper.PipelineRequirement]: unmet requirements
def pipe_iface(self)

Old-way access to pipeline key-to-interface mapping

Returns:

  • Mapping: Binding between pipeline key and interface data
def pipeline_names(self)

Names of pipelines about which this interface is aware.

Returns:

  • Iterable[str]: names of pipelines about which thisinterface is aware
def pipelines_path(self)

Path to pipelines folder.

Returns:

  • str | None: Path to pipelines folder, if configured withfile rather than with raw mapping.
def protomap(self)

Access protocol mapping portion of this composite interface.

Returns:

  • Mapping: binding between protocol name and pipeline key.
def select_pipeline(self, pipeline_name)

Check to make sure that pipeline has an entry and if so, return it.

Parameters:

  • pipeline_name (str): Name of pipeline.

Returns:

  • Mapping: configuration data for pipeline indicated

Raises:

  • MissingPipelineConfigurationException: if there's noconfiguration data for the indicated pipeline
def uses_looper_args(self, pipeline_name)

Determine whether indicated pipeline accepts looper arguments.

Parameters:

  • pipeline_name (str): Name of pipeline to check for looperargument acceptance.

Returns:

  • bool: Whether indicated pipeline accepts looper arguments.
def validate(self, pipeline)

Determine whether any declared requirements are unmet.

Parameters:

  • pipeline (str): key for the pipeline to validate

Returns:

  • bool: whether any declared requirements are unmet

Raises:

  • MissingPipelineConfigurationException: if the requested pipelineis not defined in this interface

Class Sample

Class to model Samples based on a pandas Series.

Parameters:

  • series (Mapping | pandas.core.series.Series): Sample's data.

Examples:

    from models import Project, SampleSheet, Sample
    prj = Project("ngs")
    sheet = SampleSheet("~/projects/example/sheet.csv", prj)
    s1 = Sample(sheet.iloc[0])
def determine_missing_requirements(self)

Determine which of this Sample's required attributes/files are missing.

Returns:

  • (type, str): hypothetical exception type along with messageabout what's missing; null and empty if nothing exceptional is detected
def generate_filename(self, delimiter='_')

Create a name for file in which to represent this Sample.

This uses knowledge of the instance's subtype, sandwiching a delimiter between the name of this Sample and the name of the subtype before the extension. If the instance is a base Sample type, then the filename is simply the sample name with an extension.

Parameters:

  • delimiter (str): what to place between sample name and name ofsubtype; this is only relevant if the instance is of a subclass

Returns:

  • str: name for file with which to represent this Sample on disk
def input_file_paths(self)

List the sample's data source / input files

Returns:

  • list[str]: paths to data sources / input file for this Sample.
def library(self)

Backwards-compatible alias.

Returns:

  • str: The protocol / NGS library name for this Sample.
def set_pipeline_attributes(self, pipeline_interface, pipeline_name, permissive=True)

Set pipeline-specific sample attributes.

Some sample attributes are relative to a particular pipeline run, like which files should be considered inputs, what is the total input file size for the sample, etc. This function sets these pipeline-specific sample attributes, provided via a PipelineInterface object and the name of a pipeline to select from that interface.

Parameters:

  • pipeline_interface (PipelineInterface): A PipelineInterfaceobject that has the settings for this given pipeline.
  • pipeline_name (str): Which pipeline to choose.
  • permissive (bool): whether to simply log a warning or errormessage rather than raising an exception if sample file is not found or otherwise cannot be read, default True
def set_read_type(self, rlen_sample_size=10, permissive=True)

For a sample with attr ngs_inputs set, this sets the read type (single, paired) and read length of an input file.

Parameters:

  • rlen_sample_size (int): Number of reads to sample to infer read type,default 10.
  • permissive (bool): whether to simply log a warning or error messagerather than raising an exception if sample file is not found or otherwise cannot be read, default True.

Class SubmissionConductor

Collects and then submits pipeline jobs.

This class holds a 'pool' of commands to submit as a single cluster job. Eager to submit a job, each instance's collection of commands expands until it reaches the 'pool' has been filled, and it's therefore time to submit the job. The pool fills as soon as a fill criteria has been reached, which can be either total input file size or the number of individual commands.

def add_sample(self, sample, rerun=False)

Add a sample for submission to this conductor.

Parameters:

  • sample (peppy.Sample): sample to be included with this conductor'scurrently growing collection of command submissions
  • rerun (bool): whether the given sample is being rerun rather thanrun for the first time

Returns:

  • bool: Indication of whether the given sample was added tothe current 'pool.'

Raises:

  • TypeError: If sample subtype is provided but does not extendthe base Sample class, raise a TypeError.
def failed_samples(self)
def num_cmd_submissions(self)

Return the number of commands that this conductor has submitted.

Returns:

  • int: Number of commands submitted so far.
def num_job_submissions(self)

Return the number of jobs that this conductor has submitted.

Returns:

  • int: Number of jobs submitted so far.
def submit(self, force=False)

Submit one or more commands as a job.

This call will submit the commands corresponding to the current pool of samples if and only if the argument to 'force' evaluates to a true value, or the pool of samples is full.

Parameters:

  • force (bool): Whether submission should be done/simulated evenif this conductor's pool isn't full.

Returns:

  • bool: Whether a job was submitted (or would've been ifnot for dry run)
def write_script(self, pool, size)

Create the script for job submission.

Parameters:

  • pool (Iterable[(peppy.Sample, str)]): collection of pairs in whichfirst component is a sample instance and second is command/argstring
  • size (float): cumulative size of the given pool

Returns:

  • str: Path to the job submission script created.
def write_skipped_sample_scripts(self)

For any sample skipped during initial processing, write submission script.

Version Information: looper v0.12.4, generated by lucidoc v0.4.1