ruote tmp/log_2013-03-14.html

2013-03-14 00:20:14 utc

ypz_

what would be the correct way to get the name from a Ruote.define object ? empirically, it is the value of pdef[1]['name']

2013-03-14 05:35:52 utc

jmettraux

ypz: hello, yes pdef[1]['name'] is probably the shortest way

2013-03-14 05:36:18 utc

ypz

hi,

2013-03-14 05:36:42 utc

ypz

so it is acceptable to access it this way ? I was hope there is a getter for it

2013-03-14 05:36:45 utc

jmettraux

although, in the flow, the workitem handed to participants has a #wf_name and a #wf_revision method

2013-03-14 05:36:54 utc

jmettraux

it's totally acceptable

2013-03-14 05:37:06 utc

jmettraux

process definition are just trees

2013-03-14 05:37:14 utc

jmettraux

(once "generated")

2013-03-14 05:37:24 utc

jmettraux

feel free to wrap that in any class you like

2013-03-14 05:37:42 utc

jmettraux

the "process portfolio management" is left to integrators

2013-03-14 05:38:18 utc

jmettraux

there are people trying to build things around that: https://github.com/coffeeaddict/ruote-registry

2013-03-14 05:38:25 utc

ypz

well, I generated pdef object and stored them in DB, and other script (not aware of anything about Ruote) is reading them out from db directly

2013-03-14 05:39:00 utc

jmettraux

ok

2013-03-14 05:39:28 utc

ypz

great, thanks

2013-03-14 22:09:26 utc

ypz

hi, jmettraux

2013-03-14 22:09:40 utc

jmettraux

hello, good afternoon

2013-03-14 22:09:54 utc

ypz

what time is it at your place ?

2013-03-14 22:10:01 utc

jmettraux

0657

2013-03-14 22:10:13 utc

ypz

then good morning to you

2013-03-14 22:10:18 utc

jmettraux

you're in SF iirc

2013-03-14 22:10:21 utc

jmettraux

thanks!

2013-03-14 22:10:42 utc

ypz

yea, I am in the SF Bay area

2013-03-14 22:11:22 utc

jmettraux

how can I help you?

2013-03-14 22:11:31 utc

ypz

when I use a participant to handle on_error conditions, the process itself is removed from the engine, correct ?

2013-03-14 22:12:03 utc

jmettraux

ACTION looks again at the docs

2013-03-14 22:13:06 utc

ypz

i am trying to figure out how to handle various types of errors one may encounter while processing a a workflow

2013-03-14 22:15:13 utc

jmettraux

if you use dashboard.on_error = 'participant', the process should not be removed

2013-03-14 22:15:22 utc

jmettraux

is that what you're using?

2013-03-14 22:16:15 utc

ypz

i used sequence :on_error => 'error_handler'

2013-03-14 22:16:44 utc

jmettraux

is the sequence the top "embracing" block?

2013-03-14 22:16:45 utc

phaeron

jmettraux: finally setup a staging environment where I can run stuff in a vm , under valgrind

2013-03-14 22:17:04 utc

jmettraux

phaeron: hello, good good

2013-03-14 22:17:12 utc

ypz

jmettraux, yes,

2013-03-14 22:17:32 utc

jmettraux

ypz: then the sequence will execute the participant and then be "over"

2013-03-14 22:17:57 utc

jmettraux

ypz: since it's the top sequence, the process terminates as well (unless the on_error participant doesn't reply immediately)

2013-03-14 22:18:43 utc

jmettraux

ypz: maybe a good rule of thumb would be to deal with known errors "in participants", and let the rest of the errors jam their processes

2013-03-14 22:19:02 utc

phaeron

jmettraux: I compared the setups between the two vms ( leaking vs. non leaking ) and couldn't find any difference.

2013-03-14 22:19:49 utc

jmettraux

ypz: then when you have a good grip on the thing, you can start using those block on_error constructs

2013-03-14 22:20:14 utc

jmettraux

ypz: but please experiment and have fun

2013-03-14 22:20:32 utc

jmettraux

phaeron: can you reproduce the leak?

2013-03-14 22:20:36 utc

ypz

jmettraux by saying " to deal with known errors "in participants", do you mean to implement on_error method for that participant ?

2013-03-14 22:21:12 utc

jmettraux

ypz: sorry, I meant regular rescue/ensure blocks inside of the participant implementations to deal with local issues

2013-03-14 22:21:25 utc

jmettraux

ypz: those that can be handled at the participant level

2013-03-14 22:21:52 utc

jmettraux

ypz: (and that you don't want to jam their processes)

2013-03-14 22:23:42 utc

phaeron

jmettraux: yes. as far as I can see , but valgrind is not reporting it yet

2013-03-14 22:24:22 utc

phaeron

jmettraux: the ruote setup is a bit custom https://github.com/MeeGoIntegration/boss/blob/bundled/Gemfile.lock

2013-03-14 22:24:32 utc

phaeron

opensuse 12.1 64bit

2013-03-14 22:25:20 utc

phaeron

ruby 1.8.7

2013-03-14 22:26:34 utc

jmettraux

phaeron: this vm is a ruote-worker vm? Do you have an array of ruote worker vms? Or is it an amqp worker vm?

2013-03-14 22:27:13 utc

ypz

jmettraux, in séquence :on_error => 'error_handler' construct, is the work item and error message available to the 'error_handler' participant to examine what's caused the error condition? my simple test error_handler just does "pp workitem" and it doesn't produce any output

2013-03-14 22:27:43 utc

phaeron

jmettraux: single ruote fs engine with one amqp worker (same vm for this test)

2013-03-14 22:28:32 utc

jmettraux

phaeron: and the leak is coming from the ruote process or the amqp worker process?

2013-03-14 22:28:50 utc

jmettraux

ypz: looking at the doc...

2013-03-14 22:29:47 utc

jmettraux

ypz: the workitem handed to the error handled should have an __error__ field, the workitem class has a #error method to get it directly

2013-03-14 22:30:44 utc

phaeron

this 'boss' script https://github.com/MeeGoIntegration/boss/blob/bundled/boss

2013-03-14 22:30:50 utc

phaeron

eventually eats lots of memory

2013-03-14 22:31:04 utc

phaeron

pmap says it is all heap

2013-03-14 22:31:47 utc

jmettraux

phaeron: that's the script that contains the ruote worker

2013-03-14 22:32:03 utc

phaeron

and initializes the engine too

2013-03-14 22:32:38 utc

phaeron

storage , I mean

2013-03-14 22:33:33 utc

jmettraux

ypz: here's a test (a bit convoluted) that leverages the #error method: https://github.com/jmettraux/ruote/blob/master/test/functional/ft_5_on_error.rb#L269-L305

2013-03-14 22:33:54 utc

jmettraux

phaeron: I'm looking forward to the valgrind results

2013-03-14 22:35:08 utc

ypz

let me look at the test

2013-03-14 22:43:53 utc

jmettraux

ypz: not sure if I should have shown this test, it's a bit raw and convoluted, it uses a stash trick, it's probably not a good example

2013-03-14 22:44:42 utc

ypz

is "stash" special in any way ?

2013-03-14 22:45:22 utc

jmettraux

yes, it's only availalble in ruote functional tests

2013-03-14 22:45:53 utc

ypz

is there any reason I can't extract error into from work item inside my error_handler, such as write it to a log file on file system ?

2013-03-14 22:46:13 utc

ypz

s/error into/error info/

2013-03-14 22:46:21 utc

jmettraux

ypz: you should have no problem doing that

2013-03-14 22:47:08 utc

jmettraux

if it doesn't work, what are the symptoms?

2013-03-14 22:47:14 utc

ypz

good to know that!

2013-03-14 22:47:30 utc

ypz

right ow, I got nothing, no errors and no output

2013-03-14 22:49:13 utc

jmettraux

maybe an error in your error_handler

2013-03-14 22:49:53 utc

jmettraux

add some puts statements to determine where it stops behaving, maybe add a rescue block

2013-03-14 22:50:07 utc

ypz

in document, http://ruote.rubyforge.org/exp/on_error.html, it mentions about (error) messages, any doc on how to receive such messages ?

2013-03-14 22:50:08 utc

jmettraux

acertain the thing before it goes hiding under the rug

2013-03-14 22:50:57 utc

ypz

yea, I'll try to trim my error handler to its minimum to figure out what's going on there, now I know that it should work

2013-03-14 22:50:59 utc

jmettraux

in the same way, by writing a participant or a subprocess

2013-03-14 22:59:15 utc

jmettraux

ypz: here is a simple example, it digs into workitem.error: https://gist.github.com/anonymous/5165918

2013-03-14 23:02:27 utc

ypz

alright, my abs. bare bone error handler is able to puts out the work item along with error message !

2013-03-14 23:06:10 utc

ypz

jmettraux that's plenty of info to get me going for now, thanks a lot!

2013-03-14 23:09:03 utc

jmettraux

ypz: you're welcome!

2013-03-14 23:09:27 utc

ypz

bye

2013-03-14 23:09:32 utc

jmettraux

bye!

2013-03-14 23:21:54 utc

phaeron

jmettraux: sorry , this is the script that is running https://github.com/MeeGoIntegration/boss/blob/0.8.0/boss

2013-03-14 23:25:40 utc

phaeron

it's in master now

2013-03-14 23:35:36 utc

phaeron

https://github.com/MeeGoIntegration/boss-standard-workflow/blob/master/processes/SRCSRV_REQUEST_CREATE.BOSS_handle_SR.pdef#L454

2013-03-14 23:35:50 utc

phaeron

similar constructs causes very high cpu usage

2013-03-14 23:37:05 utc

jmettraux

phaeron: it iterates on how many actions?

2013-03-14 23:37:42 utc

phaeron

varies. usually 2-4 and doesn't cause much trouble. recently a big request had about 130 actions

2013-03-14 23:38:23 utc

jmettraux

what causes the high cpu usage? What is do_wait_for_build?

2013-03-14 23:38:48 utc

phaeron

https://github.com/MeeGoIntegration/boss-standard-workflow/blob/master/processes/SRCSRV_REQUEST_CREATE.BOSS_handle_SR.pdef#L510

2013-03-14 23:39:15 utc

phaeron

is_repo_published is an amqp participant

2013-03-14 23:39:24 utc

phaeron

that checks an external system

2013-03-14 23:39:50 utc

jmettraux

cannot pinpoint on the real cpu hog?

2013-03-14 23:40:30 utc

phaeron

not really. I wrote a similar smaller process and got the high cpu usage similarly

2013-03-14 23:41:27 utc

jmettraux

I'm afraid I cannot help much

2013-03-14 23:42:01 utc

phaeron

yeah I am still trying to find a single point of failure

2013-03-14 23:42:16 utc

phaeron

jmettraux: don't worry I am not giving up yet :)

2013-03-14 23:42:36 utc

jmettraux

well, lots of suspects

2013-03-14 23:43:55 utc

jmettraux

it'd be interesting to run a process that just contains an invocation to do_wait_for_build and measure

2013-03-14 23:44:15 utc

jmettraux

(just a few simplification iterations: https://gist.github.com/anonymous/5166219 )

2013-03-14 23:46:41 utc

phaeron

I am doing the last simpler form with a dumper ampq participant but it doesn't call to the external system

2013-03-14 23:46:52 utc

phaeron

and I can see the memory increase slowly in top

2013-03-14 23:47:32 utc

jmettraux

then try removing your participant

2013-03-14 23:48:23 utc

jmettraux

if your sure it's ruote's fault, it's pretty easy to prove it

2013-03-14 23:48:39 utc

jmettraux

without any amqp stuff

2013-03-14 23:49:08 utc

jmettraux

just write a plain ruote test case and state your measurements points and results

2013-03-14 23:49:13 utc

phaeron

ok

2013-03-14 23:49:28 utc

phaeron

I am still also figuring out how to produce the measurements

2013-03-14 23:49:35 utc

jmettraux

valgrind?

2013-03-14 23:49:40 utc

jmettraux

top?

2013-03-14 23:50:06 utc

jmettraux

btw, what are the symptoms of the memory leak in the wild?

2013-03-14 23:50:49 utc

phaeron

increasing memory usage , eventually swapping , and then system thrashing

2013-03-14 23:51:18 utc

phaeron

(I know what tools to use for measurements but how to show them to you :) )

2013-03-14 23:51:35 utc

jmettraux

maybe you have to first prove it's ruote's fault

2013-03-14 23:51:40 utc

phaeron

I'll collect them in a report and paste them

2013-03-14 23:53:27 utc

jmettraux

in the end, I'd love to have a simple test case that tells me that ruote is leaking memory and how

2013-03-14 23:53:45 utc

phaeron

yes

2013-03-14 23:53:50 utc

jmettraux

platforms detailed included

2013-03-14 23:54:23 utc

jmettraux

but he culprit could be somewhere around your participant

2013-03-14 23:54:37 utc

jmettraux

and also remember that you have an identical vm that doesn't leak

2013-03-14 23:54:52 utc

phaeron

but the participant is a python process that runs elsewhere

2013-03-14 23:54:53 utc

jmettraux

have you looked at the Ubuntu package versions?

2013-03-14 23:55:13 utc

phaeron

all the heap usage is by that script I linked to above (boss)

2013-03-14 23:55:20 utc

phaeron

what ubuntu package versions ?

2013-03-14 23:55:36 utc

jmettraux

the "local" participant that dispatches over AMQP, is that a vanilla ruote-amqp participant or something that you guys developed or modified?

2013-03-14 23:55:51 utc

jmettraux

are your two vm's package identical?

2013-03-14 23:56:05 utc

jmettraux

you only showed the Gemfile

2013-03-14 23:56:08 utc

phaeron

the two vms should be identical yes

2013-03-14 23:56:16 utc

jmettraux

you didn't report the ruby patch level

2013-03-14 23:56:18 utc

jmettraux

should be

2013-03-14 23:56:43 utc

phaeron

https://github.com/MeeGoIntegration/boss/blob/master/boss#L81

2013-03-14 23:57:02 utc

phaeron

ruby 1.8.7 (2011-12-28 patchlevel 357) [x86_64-linux]

2013-03-14 23:57:17 utc

jmettraux

that's the receiver, there's also the participant involved

2013-03-14 23:57:18 utc

phaeron

same on both systems

2013-03-14 23:57:36 utc

jmettraux

are the system packages identical?

2013-03-14 23:57:50 utc

jmettraux

are the two vms running on the same host?

2013-03-14 23:57:52 utc

phaeron

yes

2013-03-14 23:57:56 utc

phaeron

not same host

2013-03-14 23:58:04 utc

phaeron

same packages installed on both sides

2013-03-14 23:58:33 utc

jmettraux

not the same host... does moving the OK vm to the NotOK host make the vm go NotOK?

2013-03-14 23:58:34 utc

phaeron

usually physical host doesn't affect vm internals

2013-03-14 23:58:51 utc

phaeron

I can't migrate the vms, at least not very easily

2013-03-14 23:59:39 utc

phaeron

if by participant you mean the remote on the other side of amqp , it is a python script , custom ruote-amqp

2013-03-14 23:59:59 utc

jmettraux

I meant the local participant, the one that places the message in AMQP

2013-03-15 00:00:13 utc

jmettraux

for that "real participant", the python one

2013-03-15 00:00:15 utc

phaeron

lbt: can you help

2013-03-15 00:01:50 utc

phaeron

I hope he's still awake :)

2013-03-15 00:05:09 utc

phaeron

jmettraux: the launchers are also remote amqp python scripts. ruote is intermediate

2013-03-15 00:05:19 utc

jmettraux

good

2013-03-15 00:05:20 utc

phaeron

I am sorry I might be confusing you

2013-03-15 00:05:28 utc

jmettraux

no worries

2013-03-15 00:05:55 utc

lbt

hey ... sure

2013-03-15 00:06:01 utc

lbt

hi jmettraux

2013-03-15 00:06:12 utc

jmettraux

lbt: hello, good late evening

2013-03-15 00:06:12 utc

phaeron

but as far as I understand : "python launcher (process + workitem ) " -> amqp -> ruote -> ( python amqp participants )

2013-03-15 00:09:24 utc

lbt

so just catching up on backlog

2013-03-15 00:11:52 utc

lbt

so the process that grows is the "boss" script in that ^^ url

2013-03-15 00:13:19 utc

lbt

and it is essentially just a wrapper around ruote Dash/Worker/FsStorage

2013-03-15 00:13:52 utc

lbt

phaeron: I wonder if we could use a different storage?