[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] / [Samples] / hours
Wed Nov 14 16:12:37 PST 2012

# Mathematics in Action: What jobs have been running too long

## Note

This a real problem.

## Background

As part of my task of keeping an FTP server running I needed to be able to terminate FTP sessions that have been running for more than a certain number of hours - which I call the 'grace period'. I can get a list of these sessions with either the date or the time when they started in particular position. If there is a date then the session is more than 24 hours long and needs killing. When there is a start time then this is a number between 0 and 23 inclusive and indicates the time on a twenty-four hour clock. I can also get hold of the time now on the same twenty-four hour clock. All these times are whole numbers: 8 means 8 in the morning and 20 means 8 in the evening. 23 means 11 at night and 0 stands for midnight.

My problem is to get a function that tells me whether or not a the difference between two 24-hour times is greater than a given grace period.

## Examples

When the grace period is 4 hours and a job started at 4 (24 hour clock time) then it needs to be killed when the 24 hour time is 8. If it starts at 11pm or 23 (on the 24 hour clock) then it needs to be stopped at 3am.

## Quality Requirement

We do not want to be sued for stopping jobs before the announced time. But we can not afford to let jobs accumulate.... the system crashes.

## Formal Analysis

We will use the C/C++ notation for remainder or modulus:

Use C.expressions. (above) |-(mod): For int i,j>0, i%j::=i - j*(i/j).

1. (The|-symbol is used to indicate that a formaula has been asserted to be true. It may be preceeded by a list of reasons.)

We can show that

2. (mod)|- (r0): For i,j>0, i=i%j+j*(i/j).
3. (mod)|- (r1): For i,j>0, for some k(i=k*j+i%j)).

Net

1. s::int=the start time,
2. S::0..23=the start time on the 24 hour clock.
3. t::int=the time now,
4. T::0..23=the time, now, on the 24 hour clock.
5. g::0..23=the grace period in hours.

6. D::int=T-S, the difference between to 24 hour clock times.
7. d::int=t-s, the real time difference in hours.

Given D and g we want to determine if d > g or if d<=g.

We have been told:

8. (given)|- (1): T = t % 24
9. (given)|- (2): S = s % 24
10. (given)|- (3): 0 <= t-s and t-s < 24.

So we deduce:

11. (1, 2)|- (4): (T-S)%24 = (t-s)%24.
12. (4, r1)|- (5): for some k, d = D + 24*k
13. (1, 2)|- (6): 0 <= T and S < 24,
14. (6, 3)|- (7): d in { D, D+24, D+48, ...} and 0..23
15. = { D, D+24 }.

Now

16. (above)|- (D0): D<0 or D=0 or D>0.
17. (7, 5)|- (11): if D=0 then d<g.
18. (6, 7)|- (15): if D<0 then ( d<g iff D+24<g ).
19. (7)|- (18): if D>0 then (d<g iff D<g).

20. (D0, 11, 15, 18)|- (20): if D<0 then (d<g iff D+24<g) else (d<g iff D<g).

## Proof of 5

Let

1. (4)|- (5.0): (T-S)%24 = (t-s)%24.
2. (5.0, D, d)|- (5.1): D%24=d%24,
3. (r1, 5.1, ei, k=>k1, ei, k=>k2)|- (5.2): D+k1*24=d+24*k2,
4. (5.2)|- (5.3): D= -k1*24+d+24*k2,
5. (5.3)|- (5.4): D= d+24*(k2-k1),
6. (5.4, eg, k=>k2-k1)|- (5.5): D= d+24*k,

(Close Let )

## Proof of 11

Let

1. |- (8): D=0.
2. (8, 7, 3)|- (9): d=0,
3. (9)|- (10): d<g.

(Close Let )

## Proof of 15

Let

1. |- (12): D<0,
2. (12, 6)|- (13): D> -24,
3. (13, 7)|- (14): d = D+24,

(Close Let )

## Proof of 18

Let

1. |- (16): D>0.
2. (16, 7)|- (17): d=D.

(Close Let )

(End of Net)

## C and C++ Code

We would therefore use the following function
`	int hangable(int T, int S, int g)`
`	{`
`		if (T-S < 0)`
`			return T-S+24 < g;`
`		else`
`			return T-S < g;`
`	}`

## Actual Implementation

The "ftp.hangman" program is a shell script:
`	: Hang up day old FTP sessions.  Uses BSD -u option, RTFM`
`	tmp=/tmp/ftp.hang.\$\$`
`	ps -axu >\$tmp`
`	grep "^ftp" <\$tmp |`
`	awk '\$9!~/:/{print \$2}' |`
`	while read pid; do kill -9 \$pid; done`
`	rm \$tmp`

This works because the set of running jobs is listed in tmp by the ps -axu command. The first "word" on a line is the owner of the job, and so grep "^ftp" extracts only the FTP processes. The ps -axu format puts the time or month as the 9th word on the line and the process Id as the second word. Hence the awk command selects the 24 or more hour old processes and outputs their process Identifiers. This set of Ids (the old ftp processes) are read in, one at a time by the while - do - done loop and killed.

Clearly the only change is check for the presence of a ":" in the 9th word, extract the hour (when it started) and compare:

`  awk '\$9!~/:/{print \$2}`
`	\$9~/:/{S=substr(\$9,0,2);`
`		if(S-T>=g || S-T<0 && (S-T+24)>g)print \$2;`
`	      }' g=\$grace T=`date +%H``

## Testing

There is a hopeful theory that a program that is properly developed does not need testing. Given the unprovable state of most software tools this is a dangerous theory. In this case testing showed up a bug caused by the weak typing in 'awk'. The correct statement to select outstanding jobs is:
`          if(S-T>=0+g || S-T<0 && (S-T+24)>0+g)print \$2;`
since g is interpreted as a string and so lexographic rather than numeric ordering is used in the comparisons.

## Conclusion

Notice that the condition was not wrong. It is the coding that fell into a well known and avoidable trap.

. . . . . . . . . ( end of section What jobs have been running too long) <<Contents | End>>

# Notes

(ei): The "existential instanciation" rule of inference [ ei in logic_25_Proofs ]
(eg): The "existential generalization" rule of inference [ eg in logic_2_5Proofs ]
1. given::=a reason for assuming something.
2. above::=short hand for using all previous assertions as the reason for the next one.