[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] / [Samples] / hours
[Index] [Contents] [Source Text] [About] [Notation] [Copyright] [Comment/Contact] [Search ]
Wed Nov 14 16:12:37 PST 2012


    Mathematics in Action: What jobs have been running too long


      This a real problem.


      As part of my task of keeping an FTP server running I needed to be able to terminate FTP sessions that have been running for more than a certain number of hours - which I call the 'grace period'. I can get a list of these sessions with either the date or the time when they started in particular position. If there is a date then the session is more than 24 hours long and needs killing. When there is a start time then this is a number between 0 and 23 inclusive and indicates the time on a twenty-four hour clock. I can also get hold of the time now on the same twenty-four hour clock. All these times are whole numbers: 8 means 8 in the morning and 20 means 8 in the evening. 23 means 11 at night and 0 stands for midnight.

      My problem is to get a function that tells me whether or not a the difference between two 24-hour times is greater than a given grace period.


      When the grace period is 4 hours and a job started at 4 (24 hour clock time) then it needs to be killed when the 24 hour time is 8. If it starts at 11pm or 23 (on the 24 hour clock) then it needs to be stopped at 3am.

      Quality Requirement

      We do not want to be sued for stopping jobs before the announced time. But we can not afford to let jobs accumulate.... the system crashes.

      Formal Analysis

      We will use the C/C++ notation for remainder or modulus:

      Use C.expressions. (above) |-(mod): For int i,j>0, i%j::=i - j*(i/j).

    1. (The|-symbol is used to indicate that a formaula has been asserted to be true. It may be preceeded by a list of reasons.)

      We can show that

    2. (mod)|- (r0): For i,j>0, i=i%j+j*(i/j).
    3. (mod)|- (r1): For i,j>0, for some k(i=k*j+i%j)).


      1. s::int=the start time,
      2. S::0..23=the start time on the 24 hour clock.
      3. t::int=the time now,
      4. T::0..23=the time, now, on the 24 hour clock.
      5. g::0..23=the grace period in hours.

      6. D::int=T-S, the difference between to 24 hour clock times.
      7. d::int=t-s, the real time difference in hours.

        Given D and g we want to determine if d > g or if d<=g.

        We have been told:

      8. (given)|- (1): T = t % 24
      9. (given)|- (2): S = s % 24
      10. (given)|- (3): 0 <= t-s and t-s < 24.

        So we deduce:

      11. (1, 2)|- (4): (T-S)%24 = (t-s)%24.
      12. (4, r1)|- (5): for some k, d = D + 24*k
      13. (1, 2)|- (6): 0 <= T and S < 24,
      14. (6, 3)|- (7): d in { D, D+24, D+48, ...} and 0..23
      15. = { D, D+24 }.


      16. (above)|- (D0): D<0 or D=0 or D>0.
      17. (7, 5)|- (11): if D=0 then d<g.
      18. (6, 7)|- (15): if D<0 then ( d<g iff D+24<g ).
      19. (7)|- (18): if D>0 then (d<g iff D<g).

      20. (D0, 11, 15, 18)|- (20): if D<0 then (d<g iff D+24<g) else (d<g iff D<g).

        Proof of 5


        1. (4)|- (5.0): (T-S)%24 = (t-s)%24.
        2. (5.0, D, d)|- (5.1): D%24=d%24,
        3. (r1, 5.1, ei, k=>k1, ei, k=>k2)|- (5.2): D+k1*24=d+24*k2,
        4. (5.2)|- (5.3): D= -k1*24+d+24*k2,
        5. (5.3)|- (5.4): D= d+24*(k2-k1),
        6. (5.4, eg, k=>k2-k1)|- (5.5): D= d+24*k,

        (Close Let )

        Proof of 11


        1. |- (8): D=0.
        2. (8, 7, 3)|- (9): d=0,
        3. (9)|- (10): d<g.

        (Close Let )

        Proof of 15


        1. |- (12): D<0,
        2. (12, 6)|- (13): D> -24,
        3. (13, 7)|- (14): d = D+24,

        (Close Let )

        Proof of 18


        1. |- (16): D>0.
        2. (16, 7)|- (17): d=D.

        (Close Let )

      (End of Net)

      C and C++ Code

      We would therefore use the following function
      	int hangable(int T, int S, int g)
      		if (T-S < 0)
      			return T-S+24 < g;
      			return T-S < g;

      Actual Implementation

      The "ftp.hangman" program is a shell script:
      	: Hang up day old FTP sessions.  Uses BSD -u option, RTFM
      	ps -axu >$tmp
      	grep "^ftp" <$tmp |
      	awk '$9!~/:/{print $2}' |
      	while read pid; do kill -9 $pid; done
      	rm $tmp

      This works because the set of running jobs is listed in tmp by the ps -axu command. The first "word" on a line is the owner of the job, and so grep "^ftp" extracts only the FTP processes. The ps -axu format puts the time or month as the 9th word on the line and the process Id as the second word. Hence the awk command selects the 24 or more hour old processes and outputs their process Identifiers. This set of Ids (the old ftp processes) are read in, one at a time by the while - do - done loop and killed.

      Clearly the only change is check for the presence of a ":" in the 9th word, extract the hour (when it started) and compare:

        awk '$9!~/:/{print $2}
      		if(S-T>=g || S-T<0 && (S-T+24)>g)print $2;
      	      }' g=$grace T=`date +%H`


      There is a hopeful theory that a program that is properly developed does not need testing. Given the unprovable state of most software tools this is a dangerous theory. In this case testing showed up a bug caused by the weak typing in 'awk'. The correct statement to select outstanding jobs is:
                if(S-T>=0+g || S-T<0 && (S-T+24)>0+g)print $2;
      since g is interpreted as a string and so lexographic rather than numeric ordering is used in the comparisons.


      Notice that the condition was not wrong. It is the coding that fell into a well known and avoidable trap.

    . . . . . . . . . ( end of section What jobs have been running too long) <<Contents | End>>


    (ei): The "existential instanciation" rule of inference [ ei in logic_25_Proofs ]
    (eg): The "existential generalization" rule of inference [ eg in logic_2_5Proofs ]
  1. given::=a reason for assuming something.
  2. above::=short hand for using all previous assertions as the reason for the next one.