lib: track worst case # of cycles and don't allow granularity to go above

* The workqueue code at present errs towards optimising the granularity
  for throughput of queue items in runs.  This perhaps is at the cost
  of risking excessive delays at times.  Make the workqueue take
  worst-cases into account.

* thread.c: (thread_should_yield) When thread should yield, we can
  return the time taken for free, as it might be useful to caller.
  work_queue_run

* workqueue.h: (struct work_queue) Add fields for worst # of cycles,
  and (independently) worst time taken.

* workqueue.c: (work_queue_new) Worst starts high.

  (work_queue_run) Track the worst number of cycles taken, where a
  queue run had to yield before clearing out the queue.  Use this as an
  upper-bound on the granularity, so the granulity can never increase.

  Track the worst-case delay per work-queue, where it had to yield, thanks
  to the thread_should_yield return value change.  Note that "show thread
  cpu" already shows stats for the work_queue_run function, inc average and
  worst cases.

Deficiencies:

- A spurious outside delay (e.g.  process not run in ages) could cause
  'worst' to be very low in some particular invocation of a process,
  and it will stay that way for life of process.

- The whole thing of trying to calculate suitable granularities is just
  fragile and impossible to get 100% right.
3 files changed