In most large software engineering organizations, the work performed by developers
is carefully managed using issue tracker, problem reporting, change request and
configuration management systems. These systems require software engineers to
input data, such as a description of a problem encountered, or a comment when
checking in a source code module. The copious records generated for such systems
are a rich source of information for software engineering researchers. Besides the
examples provided below, see the proceedings from the International Workshops on
Mining Software Repositories.
Advantages: A large amount of data is often readily available. The data is stable and
is not influenced by the presence of researchers.
Disadvantages: There may be little control over the quantity and quality of information
manually entered about the work performed. For example, we found that descriptive
fields are often not filled in, or are filled in different ways by different developers.
It is also difficult to gather additional information about a record, especially if it is
very old or the software engineer who worked on it is no longer available.
Examples: Work records can be used in a number of ways. Pfleeger and Hatton (1997)
analyzed reports of faults in an air traffic control system to evaluate the effect of adding
formal methods to the development process. Each module in the software system was
designed using one of three formal methods or an informal method. Although the code
designed using formal methods tended to have fewer faults, the results were not compelling
even when combined with other data from a code audit and unit testing.
Researchers at NASA (1998) studied data from various projects in their studies
of how to effectively use COTS (commercial off-the-shelf software) in software
engineering. They developed an extensive report recommending how to improve
processes that use COTS.
Mockus et al. (2002) used data from email archives (amongst a number of different
data sources) to understand processes in open source development. Because the
developers rarely, if ever, meet face-to-face, the developer email list contains a rich
record of the software development process. Mockus et al. wrote Perl scripts to
extract information from the email archives. This information was very valuable in
helping to clarify how development in open source differs from traditional methods.
Reporting guidelines: The exact nature of the collected data needs to be specified,
along with any special considerations, such as whether any data is missing, or uninterpretable
for some reason. Additionally, any special processing of the data needs
to be reported, such as if only a certain proportion is chosen to be analysed.
0 comments:
Post a Comment