STATS

From jmips

Project metrics summary

Project analysis by [cccc] Feb 29 2012

Metric	Tag	Overall	Per Class
Number of classes (modules)	NOM	58
Lines of Code	LOC	3636	62.690
McCabe's Cyclomatic Number	MVG	728	12.552
Lines of Comment	COM	2372	40.897
LOC/COM	L_C	1.533
MVG/COM	M_C	0.307
Information Flow measure (inclusive)	IF4	1909	32.914
Information Flow measure (visible)	IF4v	1909	32.914
Information Flow measure (concrete)	IF4c	1	0.017

The following explanatory rubric is largely reproduced as-is from the cccc output:

NOM = Number of classes: Number of non-trivial classes identified by the analysis.
LOC = Lines of Code: Number of non-blank, non-comment lines of source code counted by the analysis.
COM = Lines of Comments: Number of lines of comment identified by the analysis.
MVG = McCabe's Cyclomatic Complexity: A measure of the decision complexity of the functions which make up the program.The strict definition of this measure is that it is the number of linearly independent routes through a directed acyclic graph which maps the flow of control of a subprogram. The analysis counts this by recording the number of distinct decision outcomes contained within each function, which yields a good approximation to the formally defined version of the measure.
L_C = Lines of code per line of comment: Indicates density of comments with respect to textual size of program
M_C = Cyclomatic Complexity per line of comment: Indicates density of comments with respect to logical complexity of program
IF4 = Information Flow measure: Measure of information flow between classes as suggested by Henry and Kafura. The analysis makes an approximate count of this by counting inter-module couplings identified in the module interfaces.

Commentary

The measures overall are very good, showing a high comment to code ratio (2 comments for every 3 lines of code) and a small number of lines (68) per class, which is short enough to fit nearly all of a class on a standard page (53 lines) so certainly individual methods will fit on a page, on average. Bearing in mind that there are certainly some very long classes and methods, however, these averages are intrinsically misleading because of the variance. There must also be some very short methods and classes!

The average displayed here weights a class by its number of lines, so one class of 100 lines and four classes of 5 lines each would work out to an average of 24 lines per class, which doesn't really capture the idea of "most classes are very short, and one is way too long".

The complexity measures the number of alternate paths through a program. In general every different path deserves a comment, so the measure says that every path gets three lines of due comment. That is very reasonable, but it is clear from that alone if the comments are in the right place. The comments might be all block-header, for example, with nothing commenting sites of difficulty in situ.

The information flow measures are unusually high, reflecting poor separations. That is to be expected, since the model models a real physical situation in which everything is connected!

COCOMO estimate

The COCOMO estimate is as follows based on 3.7KLOC, complexity C=3.6, S=1.2 for systems programming code (high complexity), and a factor M consisting of a robustness factor of 1.5 (high reliability) times a factor of 1.0 for an "adequate development environment" and a team rating of 4 (experienced engineer!) resulting in an advantage factor of 0.33 (small project), giving

 M = 1.5 * 1.0 * 0.33 = 0.5

and producing an effort estimate

 SEM = C * KLOC^S * M = 3.6 * 3.7^1.2 * 0.5 = 8.65

Thus the project is estimated by COCOMO at having required 8.65 months of software engineering effort. At say, a seat-cost of 50000 GBP/annum, that is a worth of 36000 GBP.

However, the estimate is highly sensitive to factors such as the working tools .. If one were to suppose that the development environment were in fact excellent, not adequate, then M would be halved (replace "1.0" by "0.5") and the cost estimate would be halved:

 SEM = 4.325

So the author should have asked for 18000 pounds to break even. Assuming a profit margin of 50% (or deadline bonus ..) the asking price should have been 27000 pounds.

In fact, the first code entry (Debug.java!) for jMIPS 1.0 appears to have been February 24 2010, and there are entries going up to May 29 2010 (CPU5.java) for it. That is three months. At that time the project consisted of 5218 LOC and 3443 lines of comments. So the code has reduced since (by dint of hard work! Duplications have been removed).

It would have been completed 1.3 months ahead of time if the COCOMO estimate had been used in the bidding. The "seat costs" are the costs of keeping authors alive (paying salary) plus running overheads, which include things like office rent and power. 50000 pounds per annum is on the low side for that nowadays.

We may repeat the calculation taking into account the older figures for lines of code, supposing that at that time the code was of lesser quality, with a robustness factor of "1.0" instead of "1.5", and that gives for 5.2KLOC:

 M = 1.0 * 0.5 * 0.33 = 0.166

and

 SEM = C * KLOC^S * M = 3.6 * 5.2^1.2 * 0.166 = 4.32

By May 16 2010, all the project documentation (simulator[1-5].html, README, etc) had also been written. It looks to have been as complete as it should have been.

So the estimate is the same whichever way one cuts it. It should have taken something more than four months to develop, and it was developed in three. If it had been costed according to COCOMO, a profit would have been made.

Measures per class

This is the analysis per class. The analysis software incorrectly included certain system classes (PrintStream, Exception, RandomAccessFile, etc) which have been left out here but which may have skewed the analysis figures overall.

It is also clear that there are other miscounts. "Read" is measured at zero lines! But then there are multiple interior classes named "Read", one in each "Pipe", of which there are also multiples, one for each CPU. The analysis software has become confused. I have removed the "zero lines" classes from the table, for clarity.

Certain classes which had no real existence (e.g.: "Shdr[]") were also included by the analysis, and I have removed those from the table.

You may wish to redo the analysis one model at a time.

Module Name	LOC	MVG	COM	L_C	M_C
ALU	193	86	44	4.386	1.955
AluReturnT	14	0	23	------	------
CPU1	275	65	225	1.222	0.289
CPU2	26	0	25	1.040	------
CPU3	29	0	26	1.115	------
CPU4	29	0	28	1.036	------
CPU5	91	16	46	1.978	0.348
Cache	113	22	106	1.066	0.208
Clock	95	16	98	0.969	0.163
Cmdline	124	44	50	2.480	0.880
Console	41	6	21	1.952	0.286
Console5	45	6	20	2.250	0.300
Cpu1	69	9	47	1.468	0.191
Cpu2	69	9	45	1.533	0.200
Cpu3	69	9	45	1.533	0.200
Cpu4	72	9	47	1.532	0.191
Cpu5	79	9	48	1.646	0.187
Debug	528	228	230	2.296	0.991
DecodeReturnT	41	0	54	0.759	------
Depend	87	37	32	2.719	1.156
DependReturnT	3	0	8	------	------
Elf	384	79	136	2.824	0.581
Globals	258	0	179	1.441	------
IOBus	1	0	3	------	------
IOUnit	2	0	12	------	------
InstructionRegister	140	0	58	2.414	------
Keyboard	63	7	49	1.286	0.143
Memory	2	0	15	------	------
OptionT	13	0	13	------	------
Page	5	0	17	------	------
Phdr	23	1	22	1.045	------
Pipe	53	0	84	0.631	------
Port	8	0	5	------	------
RegReturnT	12	0	17	------	------
Region	23	2	10	2.300	------
Register	23	4	25	0.920	------
RegisterUnit	72	9	77	0.935	0.117
Screen	58	5	67	0.866	0.075
Shdr	30	1	43	0.698	------
SignExtend	6	0	23	------	------
StopClock	9	0	15	------	------
Utility	197	19	176	1.119	0.108

The analysis marks the Debug class as too long and too complex! That is not an issue - it does not affect the functioning of the software. The complexity is in its printout routines. On the other hand, one of the few bugs reported traces back to incorrect Debug class use, so perhaps the analysis is correct!

Delta source analysis

The following graph shows numbers of lines changed in the experimental (v1.8) branch over a one-month period.

Contributor "ptb" has about 5000 lines of changes in 600 hours, or about 8.4 lines changed per hour, which works out as 200 lines of changes every day. These are mostly not new lines, however, but fixes for old lines.

Contributor "fse6969" makes about 1 line change per hour, or 24 lines changed per day. This is about the expected level for continuous (perfective) maintenance activity.

Contributions from other contributors have been sporadic in the period. A new file contribution from "fse0724" is good development activity, but not followed up with maintenance engineering beyond an initial first adjustment.

A similar graph for contributions to the project wiki is shown below: