Your Ad Here

Analysis of Tool Logs


Many software systems used by software engineers generate logs of some form or
another. For example, automatic building tools often leave records, as source code
control systems. Some organizations build sophisticated logging into a wide spectrum
of tools so they can better understand the support needs of the software engineers.

Such tool logs can be analyzed in the same way tools that have been deliberately
instrumented by the researchers – the distinction is merely that for this independent
technique, the researchers don’t have control over the kind of information collected.
This technique is also similar to analysis of databases of work performed, except
that the latter includes data manually entered by software engineers.
The analysis of tool logs has become a very popular area of research within
software engineering. Besides the examples provided below, see the proceedings
from the International Workshops on Mining Software Repositories.
Advantages: The data is already in electronic form, making it easier to code and
analyze. The behaviour being logged is part of software engineers normal work
routine.
Disadvantage: Companies tend to use different tools in different ways, so it is difficult
to gather data consistently when using this technique with multiple
organizations.
Examples: Wolf and Rosenblum (1993) analyzed the log files generated by build
tools. They developed tools to automatically extract information from relevant
events from these files. This data was input into a relational database along with the
information gathered from other sources.
In one of our studies (Singer et al., 1997) we looked at logs of tool usage collected
by a tools group to determine which tools software engineers throughout the
company (as opposed to just the group we were studying) were using the most. We
found that search and Unix tools were used particularly often.
Herbsleb and Mockus (2003) used data generated by a change management
system to better understand how communication occurs in globally distributed
software development. They used several modeling techniques to understand the
relationship between the modification request interval and other variables including
the number of people involved, the size of the change, and the distributed nature of
the groups working on the change. Herbsleb and Mockus also used survey data to
elucidate and confirm the findings from the analysis of the tool logs. In general they
found that distributed work introduces delay. They propose some mechanisms that
they believe influence this delay, primarily that distributed work involves more
people, making the change requests longer to complete.
Reporting guidelines: As with instrumentation, the exact nature of what is being
collected needs to specified, along with any special concerns, such as missing data.
Additionally, if the data is processed in any way, it needs to be explained.

0 comments:

Post a Comment

Popular Posts

Recent posts