ruote tmp/log_2012-11-06.html

2012-11-06 03:03:14 utc

mburnett

so i have a situation where i have N concurrent jobs that get submitted to a remote process via AMQP, then i need to wait a long time and then resume the process once those concurrent jobs have all finished

2012-11-06 03:03:34 utc

mburnett

what's a good/the right way to approach that?

2012-11-06 03:53:04 utc

mburnett

nevermind, i was being foolish about how receivers worked

2012-11-06 16:49:25 utc

mburnett

is there a typical way of reporting an error on a workitem received via AMQP? I see a thead of mid-late 2011 in the mailing list, but I'm having a hard time understanding how to apply that to my case.

2012-11-06 20:04:33 utc

mburnett

ah, it seems that Ruote::Amqp::Receiver flunk has a different interface from Ruote::Receiver flunk

2012-11-06 21:02:31 utc

jmettraux

mburnett: hello, yes, #flunk is used to pass errors back from the receivers

2012-11-06 21:02:52 utc

mburnett

yeah, i was just passing it all the wrong stuff :)

2012-11-06 21:03:59 utc

mburnett

now i just need to get a curl-friendly inteface up to have a complete tracer bullet

2012-11-06 22:58:37 utc

mburnett

how do i abort components of a process that depend on a failed component without killing everything?

2012-11-06 22:58:50 utc

jmettraux

what is a component?

2012-11-06 22:59:09 utc

mburnett

an Amqp::Receiver in this case

2012-11-06 22:59:40 utc

jmettraux

it's not a component of a process

2012-11-06 22:59:42 utc

mburnett

i know that the idea is that failed processes will be administered and error sections corrected

2012-11-06 23:00:02 utc

mburnett

maybe I should just put up a gist

2012-11-06 23:00:09 utc

jmettraux

I can tell you how to cancel parts of a workflow instance

2012-11-06 23:00:29 utc

mburnett

ok

2012-11-06 23:00:32 utc

jmettraux

do you need a way to unregister an Amqp::Receiver?

2012-11-06 23:00:43 utc

jmettraux

and make it unsubscribe?

2012-11-06 23:00:47 utc

mburnett

maybe i should just fill in some background

2012-11-06 23:00:54 utc

mburnett

and you can tell me how that's the wrong design :)

2012-11-06 23:01:10 utc

jmettraux

maybe an email to the mailing list would be more appropriate

2012-11-06 23:01:28 utc

mburnett

ok

2012-11-06 23:02:37 utc

jmettraux

breakfast here

2012-11-06 23:02:49 utc

jmettraux

I'm OK to help via IRC, but please remember I cannot read your mind

2012-11-06 23:13:54 utc

mburnett

well, here's the gist: https://gist.github.com/4028287

2012-11-06 23:14:00 utc

mburnett

if you like, i'll post more details to the mailing list

2012-11-06 23:15:39 utc

jmettraux

what is the question?

2012-11-06 23:16:23 utc

mburnett

so the question is basically "what's the right way to handle failed grid jobs"

2012-11-06 23:16:35 utc

mburnett

right now i'm doing flunk()

2012-11-06 23:16:37 utc

jmettraux

that's very deep

2012-11-06 23:16:47 utc

mburnett

ok, so then let's narrow the scope

2012-11-06 23:16:52 utc

jmettraux

flunk() will pass the error to ruote

2012-11-06 23:16:54 utc

mburnett

what's a reasonable way to handle failed grid jobs here

2012-11-06 23:17:20 utc

jmettraux

so flunk() is read, IMHO

2012-11-06 23:17:27 utc

jmettraux

so flunk() is right, IMHO

2012-11-06 23:17:42 utc

mburnett

right that seems to work, i guess the behavior that most closely matches our existing infrastructure is that the process is marked as failed, but any non-depdendent parts of the process are still run

2012-11-06 23:18:02 utc

jmettraux

that's the default behaviour

2012-11-06 23:18:10 utc

mburnett

ah

2012-11-06 23:18:44 utc

jmettraux

if you have two concurrent ruote branches and one ends up in an error, the other will go on

2012-11-06 23:18:45 utc

mburnett

so basically i just need to monitor the failures so that i can flag the whole process as failed?

2012-11-06 23:18:53 utc

mburnett

right

2012-11-06 23:19:02 utc

mburnett

i just need to notify users that this process has failed

2012-11-06 23:19:30 utc

jmettraux

ruote-wise, a branch of the process failed

2012-11-06 23:19:32 utc

mburnett

so your initial recommendation would basically be to just flunk and do nothing else inside ruote?

2012-11-06 23:19:43 utc

jmettraux

yes

2012-11-06 23:19:48 utc

mburnett

ok

2012-11-06 23:20:09 utc

mburnett

i plan to setup a historian service listening to messages on amqp for stuff like this

2012-11-06 23:20:19 utc

mburnett

to create entries in our existing tracking system

2012-11-06 23:20:45 utc

jmettraux

as long as everything goes through AMQP, it's great

2012-11-06 23:31:52 utc

jmettraux

you're building an AMQP powered interface in front of your grid

2012-11-06 23:32:06 utc

jmettraux

your clients are ruote or whatever talks AMQP

2012-11-06 23:32:49 utc

jmettraux

services and orchestration of services

2012-11-06 23:38:58 utc

mburnett

that's right

2012-11-06 23:41:15 utc

mburnett

i really like this architecture

2012-11-06 23:42:15 utc

mburnett

is there a way to query ruote about whether a process has any possible ways to proceed without intervention? i.e. has every branch not blocked by an error completed?

2012-11-06 23:43:55 utc

mburnett

is leaves() the best attempt?

2012-11-06 23:44:39 utc

mburnett

and then check each one for error state

2012-11-07 00:05:20 utc

jmettraux

yes, leaves could help