Data Glossary
Office of Data Analytics and Business Intelligence | June 24, 2021
This glossary covers terms that are
used in communication and use of day to day usage of data.
Application
Application refers to a piece of
software, designed to run on the web or on mobile phones, that connects to
large databases. Applications are a way of consuming open data, and are
real-time, personalized, and location-specific.
Application
Programming Interface (API)
A computing interface that defines
interactions between multiple software or mixed hardware-software
intermediaries. In simple terms APIs can be used to connect to data and extract
it. The Open Data Portal (ODP) has APIs available for every published data set
that can be used to extract and analyze data without manually downloading data
from the Dallas Open Data Portal.
Attribute
A
characteristic of a geographic feature, typically stored in tabular format and
linked to the feature in a relational database. The attributes of a
well-represented point might include an identification number, address, and
type.
Base
Layer
A primary
layer for spatial reference, upon which other layers are built. Examples of a
base layer typically used are either the parcels, or street centerlines.
Buffer
A zone of a
specified distance around a feature.
Catalog
A catalog is a collection of datasets,
maps and other visuals. The Dallas Open Data Portal catalog has three broad
categories, economy, services and public safety. All datasets are broadly
divided and cataloged under these three categories.
Category
Methodology by which items or datasets are classified or grouped under a similar theme or topic. Also referred to as Taxonomy.
Comma Separated Values
File (CSV)
A standard format for spreadsheets
where data is stored in a plain text file, with each data row on a new line and
commas separating the values on each row.
Computer
Aided Design (CAD)
An automated
system for the design, drafting and display of graphically oriented
information.
Confidential Data
The principle whereby certain data should not be disclosed because disclosure is prohibited by law, or disclosure could be detrimental to the City or to other individuals or entities. Also referred to as Protected Data.
Coordinate
An x,y
location in a Cartesian coordinate system or an x,y,z coordinate in a three
dimensional system. Coordinates represent locations on the Earth’s surface
relative to other locations.
Data
Data can be defined broadly as
information collected on a specific subject. Data types can be divided further
into structured data (organized in rows and columns) or unstructured data
(images). Data includes numbers or information located in tables, graphs, maps,
images and others. Data can be used to understand spatial and temporal trends
of a phenomenon, to figure out associated factors for the phenomenon and also
future prediction of the phenomenon.
Data Cleaning or
Scrubbing
Data in its raw form needs to be
cleaned and processed to eliminate errors. Data scrubbing can take as much as
80% of an analysts time in the entire analytical process.
Data Dictionary
Data dictionary is information about
data variables and names in addition to the data table.
Data Portal
Data portal, in this case the City of
Dallas Open Data Portal is a platform where data and maps from the City of
Dallas and other valid sources are published and shared with the public free of
cost. Data on Dallas Open Data Portal is regularly maintained and refreshed to
keep it recent and relevant for the users.
Data Schema
A specification that defines the structure of the data (required data elements and types, and supporting definitions).
Data
Story/Perspective Pages
Data stories or perspective pages
(inside Dallas Open Data Portal) are a very important part of data
communication, particularly to those who are not familiar with a certain data.
Data stories make it easier for the data audience to understand the objective
of data collection, processing and analysis and also helps in understanding
conclusions of the analysis.
Data Table
Data table refers to a tabular form of
data that can be displayed in rows and columns.
Data Users
Any individual or organization that
accesses, downloads, analyzes, or who uses data to develop apps,
visualizations, reports, and other information products or services.
Database
A logical
collection of interrelated information, managed and stored as a unit. A GIS
database includes data about the spatial location and shape of geographic
features recorded as points, lines, and polygons as well as their attributes.
Dataset
A dataset is any organized collection
of data. Dataset is a flexible term and may refer to an entire database, a
spreadsheet or other data file, or a related collection of data resources.
Digital
Elevation Model (DEM)
Terrain
elevation data provided in digital form.
Digitize
To encode
map features as x,y coordinates in digital form. Lines are traced to define
their shapes. This can be accomplished either manually or by use of a scanner.
File Format
The file format refers to the internal
arrangement (format) of the file, not how it is displayed to users. For
example, CSV , XLS, JSON files are structured very differently, but may look
similar or identical when opened in a spreadsheet program. The format
corresponds to the last part of the file name or extension.
Flat Files
A flat file is an informal term for a
single table of data from which all word processing or other structure
characters or markup have been removed. A flat file stores data in plain text
format. Because of their simple structure, flat files can only be read, stored
and sent. CSV files are one of the most common types of flat files.
Geocode
The process
of identifying a location by one or more attributes from a base layer.
Geographic
Information System (GIS)
An organized
collection of computer hardware, software, geographic data, and personnel
designed to efficiently capture, store, update, manipulate, analyze, and
display all forms of geographically referenced information.
Geospatial Data
Data related to the position of things in the real world, including boundaries or locations.
GitHub
GitHub is a code-hosting platform for
version control and collaboration. It allows users to work together on projects
from any location and supports open source development.
Global
Positioning System (GPS)
A satellite
based device that records x,y,z coordinates and other data. Ground locations
are calculated by signals from satellites orbiting the Earth. GPS devices can
be taken into the field to record data while walking, driving, or flying.
JavaScript Object
Notation (JSON)
A simple format for data that can
describe complex data structures, is both machine-readable and somewhat
human-readable, is independent of platform and programming language, and has
become a format for data exchange between apps, programs and computer systems.
Layer
A logical
set of thematic data described and stored in a map library. Layers act as
digital transparencies that can be laid atop one another for viewing or spatial
analysis.
Line
Lines
represent geographic features too narrow to be displayed as an area at a given
scale, such as contours, street centerlines, or streams.
Metadata
Metadata is information about a
dataset that makes the data easier to find or identify. Metadata includes the
title and description, method of collection, limitations, author, publisher,
time period covered, license, date and frequency of release. Metadata describes
the dataset’s structure, data elements, its creation, access, format, and
content.
Open Data Commons
Attribution License (ODC-By)
An agreement intended to allow users
to freely share, modify, and use a Database.
Ortho
Imagery
Aerial
photographs that have been rectified to produce an accurate image of the Earth
by removing tilt and relief displacements, which occurred when the photo was
taken.
Point
A single x,y
coordinate that represents a geographic feature too small to be displayed as a
line or area at that scale.
Polygon
A multisided
figure that represents area on a map. Polygons have attributes that describe
the geographic feature they represent.
Query
A request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, e.g., trend analyses from data-mining tools.
Scale
The ratio or
relationship between a distance or area on a map and the corresponding distance
or area on the ground.
Shapefile
The shapefile format is a popular
geospatial vector data format for geographic information system (GIS) software.
The shapefile format can spatially describe vector features: points, lines, and
polygons, representing, for example, water wells, rivers, and lakes.
Spatial
Spatial is a metadata term that means
the dataset has locational (geographic) information such as coordinates,
address, city, or ZIP code.
Spatial
Analysis
The process
of modeling, examining, and interpreting model results. Spatial analysis is
useful for evaluating suitability and capability, for estimating and predicting,
and for interpreting and understanding.
Structured Data
Structured data refers to information
with a high degree of organization, making the data readily searchable by
search engines. Tall versus wide is best to establish publishing consistency.
Structured
Query Language (SQL)
A syntax for
defining and manipulating data from a relational database. Developed by IBM in
the 1970s, it has become an industry standard for query languages in most
relational database management systems.
Tag
A tag is a keyword or term assigned to
a piece of information or a file. This type of metadata helps describe an item
and allows it to be found by browsing or searching.
Theme
An ArcView theme stores map features as primary features (such as arcs, nodes, polygons, and points) and
secondary features such as tics, map extent, links, and annotation. A theme usually represents a single geographic
layer, such as soils, roads, or land use.
Unstructured Data
Unstructured data (or unstructured
information) is information that either does not have a pre-defined data model
or is not organized in a pre-defined manner, such as a flat file. Unstructured
information is typically text-heavy, but may contain data such as dates,
numbers, and facts as well.
Visualization
A visual representation of data, such
as a chart, graph or dashboard, is often the easiest way of communicating with
data, bringing out its key features. Many visualization tools exist such as
Google Charts, Excel, ArcGIS, Tableau, and PowerBI. Creating a dataset’s
visualization requires careful attention to the meaning of the variables, the
relations between them and the stories.
XML
Extensible Markup Language, is a flexible file format designed to store, transport and share data over the Internet. XML is both human- and machine-readable.