Data mining is not restricted to one kind of data store or repository, rather it
can be applied to different kinds of data stores and repositories but with different
algorithms and techniques that may be suitable to be applied on one kind of data
but not for other.Data mining can be applied to flat files, relational databases,data warehouses, transaction databases, multimedia databases, spatial databases,
time-series databases, and the world wide web.
Flat Files: Flat files are the most used data source in data mining.It
is normally a simple data file with text or binary format with a structure
that can be recognized by the used data mining algorithm for example, a
text file with a comma separated format.
Relational Databases: A relational database consists of a set of tables.
Each table has a number of columns and rows, where columns represent
attributes, and rows represent tuples.SQL is the most common query
language for relational databases. SQL can help in retrieving and managing
the data stored in a relational database.Each tuple in a tablerepresents
an object or a relationship between objects and identified by a set
of attributes that represent a unique key.SQL can help in retrieving and managing
the data stored in a relational database.
Data Warehouses: According to a data warehouse is defined as:
”A data warehouse is a subject oriented, integrated, time variant, and nonvolatile
collection of data in support of management’s decision making process”.The datain a data warehouse can be loaded, preprocessed, and integrated together.
The organization of the data warehouse with respect to different subjects
gives the option to easily analyze the data and facilitates decision making
process.Because of their structure and the precomputed summarized data, data
cubes are well suited for interactive querying and analysis of data at different
conceptual levels, known as Online Analytical Processing (OLAP).
OLAP allows the navigation of data at different levels of abstraction, such
as, drill-down, roll-up, slice, dice, and pivot.
Multimedia Databases: Multimedia databases store images, audio, and
video. A multimedia object is high dimensional which makes data mining
a challenging process.
Time-Series Databases: Time-series databases contain time related data
such as stock market data or the distribution of car accidents in some region
with respect to time.
The World Wide Web: The WWW is a heterogeneous and dynamic
data repository. It includes a huge number of data types varying from
text, audio, video, raw data, and application. The WWW is composed
of three major parts. The web content, which represents the documents
available on the web. The structure of the web which is represented by the
hyperlinks and the relationships between different documents in the web.
And finally, the usage of the web, which describes how the web documents
and resources are being accessed and used.
0 comments:
Post a Comment