Designs

Real-world goals

  • Quantum ESPRESSO simulation but easier.
  • SSSP verifications but minimal man labor.
  • Bioinformation using container technologies.
  • Machine learning using hyperqueue.

Milestones

  • run cp2k simulations on local laptop with native container runtime integrated (bare machine launch).
  • run cp2k simulations on HPC through SSH. (let QE go where ever it is.)
  • run python based machine learning on hybrid local + GPU cluster.
  • run bio-information typical RNA data processing pipeline on cloud.
  • support WDL by parsing spec to my ast.
  • run some example through hyperequeue.

Roadmap

In the prototype phase, nice to show powerfulness of new self-baked syntax and runtime with following features.

  • process as the inner task dirver constructed as a generic state-machine.
  • trivial arithmetic task that wrap as the process for consistency.
  • shell task that can pause/resume/kill asynchronously.
  • base syntax tree to evaluate arithmetic expressions.
  • base syntax tree representation to evaluate shell commands.
  • pipeline shell tasks through syntax tree.
  • tracing log, separate print statement and log to tracing.
  • Builtin binary expressions.
  • std lexing and parsing to my syntax tree.
  • control flow: if..else, while loop
  • array-like type for holding multiple return and used as iter in for loop (python-like syntax).
  • para block using shell example.
  • customized provenace to file.
  • pipeline shell syntax sugar.
  • miette to pop nice syntax error.
  • design doc for the syntax specifications, through a mdbook (https://github.com/oxiida/book).
  • FFI through pyo3 to embed oxiida-lang into python script.
  • (*) workflow reuse
  • (*) module import.
  • Clear scope and variable management (plan to use var stack)
  • Performance, workaround GIL with multiprocessing.
  • versatile runtime that listen to the tasks and launch them.
  • sqlite as default persistence
  • Default config folder using .config/oxiida/ and support profile switch by switch config. Should also manage proper persistence target of persistence between run and submit to daemon.
  • query db and provide a lame version of graph print.
  • (*) the ffi call from python should able to return value in python, the workflow can return result as a expression.
  • (*) traverse ast and spot parsing time error as much as possible.
  • (*) statement should return value of last expression, para block should return an array with lexical order.
  • para while (error-prone, thus disallowed) and para for syntax. (check https://docs.julialang.org/en/v1/manual/multi-threading/#The-@threads-Macro)
  • (*) snapshot of a syntax tree and serialize details to restore. (this requires to change from recursive interpretor to flat loop interpretor, a big refactoring)
  • graceful cancellation with termination signals, use SIGTERM
  • static and strict data type.
  • type checking for the ffi function declaration.
  • separate the bin and lib parts into independent crates.
  • pre-runtime type check for array and variable assignment.
  • tracing for crutial flow nodes.
  • (*) Support container tasks
  • Support ssh tasks
  • Support container tasks over ssh wire.
  • chores: TODO/XXX

After prototype, incrementally add more nice to have proper language things:

  • reintrospect the design, especially where the reactor pattern is used. Draw figures for clear demostration.
  • error handling of task, how to represent in syntax, rust way with match keyword?
  • ergonomic: py daemon attach env mutate info to the reply message to client.
  • when using =_= syntax, provide way to control upbound number for the available processes.
  • FFI call from julia/lua
  • traverse the ast and generate the control flow graph.
  • break and continue keywords.
  • let's build a REPL, oxiida its a interpret language.
  • ?? local keyword for variable shadowing.
  • performance: expressions in array can be evaluated concurrently.
  • parallel the expression evaluation and hold the future to join when use.
  • separate CLI and library part and make library part dedicate crate.
  • separate pyo3 part as python SDK.
  • release fixed version of specification with the basic language primitives.
  • clean the dependencies that can be implemented by my own. list are
    • corfig and directories-rs crates for profile/config setting.
    • rmp-serde for codec to msgpack.
  • For python binding, clean the dependencies that can be implemented by my own. list are
    • serde-pythobject for convert pyAny back and forth.
    • serde_json, because should not bind to json as serializable format.
  • Considered native win support. Now I use nix which is a issue for the moment for win support, but should consider to support it as a DSL.

Now I can public the repo and release 0.1.0 to crates.io

After 0.1.0

  • anoymous function and workflow.
  • parser for WDL
  • parser for nextflow
  • parser for CWL

Misc