Looper-specific NGS Project.
str): path to configuration file with data fromwhich Project is to be built
str): name indicating subproject to use, optional
def build_submission_bundles(self, protocol, priority=True)
Create pipelines to submit for each sample of a particular protocol.
With the argument (flag) to the priority parameter, there's control over whether to submit pipeline(s) from only one of the project's known pipeline locations with a match for the protocol, or whether to submit pipelines created from all locations with a match for the protocol.
str): name of the protocol/library for which tocreate pipeline(s)
bool): to only submit pipeline(s) from the first of thepipelines location(s) (indicated in the project config file) that has a match for the given protocol; optional, default True
Iterable[(PipelineInterface, type, str, str)]:
AssertionError: if there's a failure in the attempt topartition an interface's pipeline scripts into disjoint subsets of those already mapped and those not yet mapped
Return key-value pairs of pan-Sample constants for this Project.
Mapping: collection of KV pairs, each representing a pairingof attribute name and attribute value
Collection of sample attributes for which value of each is derived from elsewhere
list[str]: sample attribute names for which value is derived
def get_interfaces(self, protocol)
Get the pipeline interfaces associated with the given protocol.
str): name of the protocol for which to get interfaces
Iterable[looper.PipelineInterface]: collection of pipelineinterfaces associated with the given protocol
KeyError: if the given protocol is not (perhaps yet) mappedto any pipeline interface
def get_outputs(self, skip_sample_less=True)
Map pipeline identifier to collection of output specifications.
This method leverages knowledge of two collections of different kinds of entities that meet in the manifestation of a Project. The first is a collection of samples, which is known even in peppy.Project. The second is a mapping from protocol/assay/library strategy to a collection of pipeline interfaces, in which kinds of output may be declared. Knowledge of these two items is here harnessed to map the identifier for each pipeline about which this Project is aware to a collection of pairs of identifier for a kind of output and the collection of this Project's samples for which it's applicable (i.e., those samples with protocol that maps to the corresponding pipeline).
bool): whether to omit pipelines that are forprotocols of which the Project has no Sample instances
Mapping[str, Mapping[str, namedtuple]]: collection of bindingsbetween identifier for pipeline and collection of bindings between name for a kind of output and pair in which first component is a path template and the second component is a collection of sample names
TypeError: if argument to sample-less pipeline skipping parameteris not a Boolean
Collection of sample attributes for which value of each is implied by other(s)
list[str]: sample attribute names for which value is implied by other(s)
Count the number of samples available in this Project.
int: number of samples available in this Project.
Directory in which to place results and submissions folders.
By default, assume that the project's configuration file specifies an output directory, and that this is therefore available within the project metadata. If that assumption does not hold, though, consider the folder in which the project configuration file lives to be the project's output directory.
str: path to the project's output directory, either asspecified in the configuration file or the folder that contains the project's configuration file.
Exception: if this property is requested on a project thatwas not created from a config file and lacks output folder declaration in its metadata section
Critical project folder keys
Determine this Project's unique protocol names.
Set[str]: collection of this Project's unique protocol names
Which metadata attributes are required.
Get the path to the project's sample annotations sheet.
str: path to the project's sample annotations sheet
Names of samples of which this Project is aware.
Return the data table that stores metadata for subsamples/units.
pandas.core.frame.DataFrame | NoneType: table ofsubsamples/units metadata
Return (possibly first parsing/building) the table of samples.
pandas.core.frame.DataFrame | NoneType: table of samples'metadata, if one is defined
Generic/base Sample instance for each of this Project's samples.
Iterable[Sample]: Sample instance for eachof this Project's samples
Annotations/metadata sheet describing this Project's samples.
pandas.core.frame.DataFrame: table of samples in this Project
Return currently active subproject or None if none was activated
str: name of currently active subproject
Return (possibly first parsing/building) the table of subsamples.
pandas.core.frame.DataFrame | NoneType: table of subsamples'metadata, if the project defines such a table
Path to folder with default submission templates.
str: path to folder with default submission templates
Project needs certain metadata.
Represent case in which sample sheet is specified but nonexistent.
This class parses, holds, and returns information for a yaml file that specifies how to interact with each individual pipeline. This includes both resources to request for cluster job submission, as well as arguments to be passed from the sample annotation metadata to the pipeline
str | Mapping): path to file from which to parseconfiguration data, or pre-parsed configuration data.
def choose_resource_package(self, pipeline_name, file_size)
Select resource bundle for given input file size to given pipeline.
str): Name of pipeline.
float): Size of input data (in gigabytes).
MutableMapping: resource bundle appropriate for given pipeline,for given input file size
ValueError: if indicated file size is negative, or if thefile size value specified for any resource package is negative
_InvalidResourceSpecificationException: if no defaultresource package specification is provided
Copy self to a new object.
def fetch_pipelines(self, protocol)
Fetch the mapping for a particular protocol, null if unmapped.
str): name/key for the protocol for which to fetch thepipeline(s)
str | Iterable[str] | NoneType: pipeline(s) to which the givenprotocol is mapped, otherwise null
def fetch_sample_subtype(self, protocol, strict_pipe_key, full_pipe_path)
Determine the interface and Sample subtype for a protocol and pipeline.
str): name of the relevant protocol
str): key for specific pipeline in a pipelineinterface mapping declaration; this must exactly match a key in the PipelineInterface (or the Mapping that represent it)
str): (absolute, expanded) path to thepipeline script
type: Sample subtype to use for jobs for the given protocol,that use the pipeline indicated
KeyError: if given a pipeline key that's not mapped in thepipelines section of this PipelineInterface
def finalize_pipeline_key_and_paths(self, pipeline_key)
Determine pipeline's full path, arguments, and strict key.
This handles multiple ways in which to refer to a pipeline (by key) within the mapping that contains the data that defines a PipelineInterface. It also ensures proper handling of the path to the pipeline (i.e., ensuring that it's absolute), and that the text for the arguments are appropriately dealt parsed and passed.
str): the key in the pipeline interface file usedfor the protocol_mappings section. Previously was the script name.
(str, str, str): more precise version of input key, along withabsolute path for pipeline script, and full script path + options
def get_arg_string(self, pipeline_name, sample, submission_folder_path='', **null_replacements)
For a given pipeline and sample, return the argument string.
str): Name of pipeline.
Sample): current sample for which job is being built
str): path to folder in which filesrelated to submission of this sample will be placed.
dict): mapping from name of Sample attributename to value to use in arg string if Sample attribute's value is null
str: command-line argument string for pipeline
def get_attribute(self, pipeline_name, attribute_key, path_as_list=True)
Return the value of the named attribute for the pipeline indicated.
str): name of the pipeline of interest
str): name of the pipeline attribute of interest
bool): whether to ensure that a string attributeis returned as a list; this is useful for safe iteration over the returned value.
def get_pipeline_name(self, pipeline)
Translate a pipeline name (e.g., stripping file extension).
str): Pipeline name or script (top-level key inpipeline interface mapping).
str: translated pipeline name, as specified in config or bystripping the pipeline's file extension
Iterate over pairs of pipeline key and interface data.
iterator of (str, Mapping): Iterator over pairs of pipelinekey and interface data
def missing_requirements(self, pipeline)
Determine which requirements--if any--declared by a pipeline are unmet.
str): key for pipeline for which to determine unmet reqs
Iterable[looper.PipelineRequirement]: unmet requirements
Old-way access to pipeline key-to-interface mapping
Mapping: Binding between pipeline key and interface data
Names of pipelines about which this interface is aware.
Iterable[str]: names of pipelines about which thisinterface is aware
Path to pipelines folder.
str | None: Path to pipelines folder, if configured withfile rather than with raw mapping.
Access protocol mapping portion of this composite interface.
Mapping: binding between protocol name and pipeline key.
def select_pipeline(self, pipeline_name)
Check to make sure that pipeline has an entry and if so, return it.
str): Name of pipeline.
Mapping: configuration data for pipeline indicated
MissingPipelineConfigurationException: if there's noconfiguration data for the indicated pipeline
def uses_looper_args(self, pipeline_name)
Determine whether indicated pipeline accepts looper arguments.
str): Name of pipeline to check for looperargument acceptance.
bool: Whether indicated pipeline accepts looper arguments.
def validate(self, pipeline)
Determine whether any declared requirements are unmet.
str): key for the pipeline to validate
bool: whether any declared requirements are unmet
MissingPipelineConfigurationException: if the requested pipelineis not defined in this interface
Class to model Samples based on a pandas Series.
Mapping | pandas.core.series.Series): Sample's data.
from models import Project, SampleSheet, Sample prj = Project("ngs") sheet = SampleSheet("~/projects/example/sheet.csv", prj) s1 = Sample(sheet.iloc)
Determine which of this Sample's required attributes/files are missing.
(type, str): hypothetical exception type along with messageabout what's missing; null and empty if nothing exceptional is detected
def generate_filename(self, delimiter='_')
Create a name for file in which to represent this Sample.
This uses knowledge of the instance's subtype, sandwiching a delimiter between the name of this Sample and the name of the subtype before the extension. If the instance is a base Sample type, then the filename is simply the sample name with an extension.
str): what to place between sample name and name ofsubtype; this is only relevant if the instance is of a subclass
str: name for file with which to represent this Sample on disk
List the sample's data source / input files
list[str]: paths to data sources / input file for this Sample.
str: The protocol / NGS library name for this Sample.
def set_pipeline_attributes(self, pipeline_interface, pipeline_name, permissive=True)
Set pipeline-specific sample attributes.
Some sample attributes are relative to a particular pipeline run, like which files should be considered inputs, what is the total input file size for the sample, etc. This function sets these pipeline-specific sample attributes, provided via a PipelineInterface object and the name of a pipeline to select from that interface.
PipelineInterface): A PipelineInterfaceobject that has the settings for this given pipeline.
str): Which pipeline to choose.
bool): whether to simply log a warning or errormessage rather than raising an exception if sample file is not found or otherwise cannot be read, default True
def set_read_type(self, rlen_sample_size=10, permissive=True)
For a sample with attr
ngs_inputs set, this sets the read type (single, paired) and read length of an input file.
int): Number of reads to sample to infer read type,default 10.
bool): whether to simply log a warning or error messagerather than raising an exception if sample file is not found or otherwise cannot be read, default True.
Collects and then submits pipeline jobs.
This class holds a 'pool' of commands to submit as a single cluster job. Eager to submit a job, each instance's collection of commands expands until it reaches the 'pool' has been filled, and it's therefore time to submit the job. The pool fills as soon as a fill criteria has been reached, which can be either total input file size or the number of individual commands.
def add_sample(self, sample, rerun=False)
Add a sample for submission to this conductor.
peppy.Sample): sample to be included with this conductor'scurrently growing collection of command submissions
bool): whether the given sample is being rerun rather thanrun for the first time
bool: Indication of whether the given sample was added tothe current 'pool.'
TypeError: If sample subtype is provided but does not extendthe base Sample class, raise a TypeError.
Return the number of commands that this conductor has submitted.
int: Number of commands submitted so far.
Return the number of jobs that this conductor has submitted.
int: Number of jobs submitted so far.
def submit(self, force=False)
Submit one or more commands as a job.
This call will submit the commands corresponding to the current pool of samples if and only if the argument to 'force' evaluates to a true value, or the pool of samples is full.
bool): Whether submission should be done/simulated evenif this conductor's pool isn't full.
bool: Whether a job was submitted (or would've been ifnot for dry run)
def write_script(self, pool, size)
Create the script for job submission.
Iterable[(peppy.Sample, str)]): collection of pairs in whichfirst component is a sample instance and second is command/argstring
float): cumulative size of the given pool
str: Path to the job submission script created.
For any sample skipped during initial processing, write submission script.
looper v0.12.4, generated by