The main configuration file is written in YAML and pre-processed through Jinja, to allow things like defining variables, macros, etc.
Define an URL pointing to the storage (for build status).
storage: "postgresql://jobcontrol_test:test@localhost:5432/jc-harvester-141125"
Configuration for the web application.
Uppercase names will be merged with standard Flask configuration.
webapp:
PORT: 5050
DEBUG: False
Configuration for Celery (the asynchronous task running library).
See all the possible configuration options here: http://docs.celeryproject.org/en/latest/configuration.html
celery:
BROKER_URL: "redis://localhost:6379"
Job definition is a list of objects like this:
id: some_job_id
title: "Some job title here"
function: mypackage.mymodule:myfunction
args:
- spam
- eggs
- bacon
kwargs:
foobar: 'Something completely different'
blah: !retval 'some_other_job'
dependencies: ['some_other_job']
..which tells JobControl to run something roughly equivalent to:
from mypackage.mymodule import myfunction
myfunction('spam', 'eggs', 'bacon',
foobar='Something completely different',
blah=get_return_value('some_other_job'))
Where the (immaginary) get_return_value() function returns the return value from the latest successful build of the specified job (which must be amongst the job dependencies).
For example, let’s say we want to “crawl” and “process” a bunch of websites.
We could use a macro like this to keep repetitions at minimum:
{% macro process_website(name, url) %}
- id: crawl_{{ name }}
title: "Crawl {{ url }}"
function: mycrawler:crawl
kwargs:
storage: postgresql://.../crawled_data_{{ name }}
- id: process_{{ name }}
title: "Process {{ url }}"
function: mycrawler:process
kwargs:
input_storage: !retval crawl_{{ name }}
storage: postgresql://.../processed_data_{{ name }}
{% endmacro %}
jobs:
{{ process_website('example_com', 'http://www.example.com') }}
{{ process_website('example_org', 'http://www.example.org') }}
{{ process_website('example_net', 'http://www.example.net') }}
Will get expanded to:
jobs:
- id: crawl_example_com
title: "Crawl http://www.example.com"
function: mycrawler:crawl
kwargs:
storage: postgresql://.../crawled_data_example_com
- id: process_example_com
title: "Process http://www.example.com"
function: mycrawler:process
kwargs:
input_storage: !retval crawl_example_com
storage: postgresql://.../processed_data_example_com
- id: crawl_example_org
title: "Crawl http://www.example.org"
function: mycrawler:crawl
kwargs:
storage: postgresql://.../crawled_data_example_org
- id: process_example_org
title: "Process http://www.example.org"
function: mycrawler:process
kwargs:
input_storage: !retval crawl_example_org
storage: postgresql://.../processed_data_example_org
- id: crawl_example_net
title: "Crawl http://www.example.net"
function: mycrawler:crawl
kwargs:
storage: postgresql://.../crawled_data_example_net
- id: process_example_net
title: "Process http://www.example.net"
function: mycrawler:process
kwargs:
input_storage: !retval crawl_example_net
storage: postgresql://.../processed_data_example_net
Warning
Mind the indentation! The best way is to use the desired final indentation in the macro definition, then call the macro at “zero” indentation level.