data.gov.sg
  • Introduction to data.gov.sg
  • 🆕What's new
    • Release Notes: Feburary 2025
    • Release Notes: October 2024
    • Release Notes: September 2024
  • USER GUIDE
    • Content quality guidelines
      • Data types
      • Data principles
    • For data consumers
      • How to search for data
      • How to raise a data request
      • How to embed the dataset explorer on your site
    • How to search for real-time APIs
    • For data owners
      • How to onboard as an admin
      • How to login
      • How to publish data
      • How to unpublish and/or delete datasets
      • How to edit datasets
      • How to generate API keys
      • Write APIs
  • DEVELOPER GUIDE
    • API overview
    • Collection APIs
    • Dataset APIs
Powered by GitBook
On this page

Was this helpful?

  1. USER GUIDE
  2. Content quality guidelines

Data principles

PreviousData typesNextFor data consumers

Last updated 1 year ago

Was this helpful?

While clear titles and descriptions are important, your data lies at the core of what users are looking for. To provide users with data that is easy and reliable to use, we have a few guiding principles.

  1. Ensure machine readability

Data should be in a format a computer can understand. This means relevant fields can be extracted and parsed without human input. As an open data collective, any data you upload must also be in non-proprietary file formats.

  1. Check for errors and inconsistencies

Your data should be free from any errors and inconsistencies! This will go a long way in establishing trust, protecting the integrity of your data, and allowing mashup or use of multiple datasets, even across agencies.

We strongly encourage correcting any issues that negatively affect data quality, such as:

  • Null values on columns. If all values in a column are null, consider removing the column.

  • Duplicate rows

  • Outliers. While having many outliers is not necessarily a cause for concern, there should not be impossible values appearing. This might look like negative values appearing in columns that should only have positive ones, for example.

  • Inconsistent capitalisation in datasets. For example, if a value in a column is entered in all uppercase, all other values in that same column should also be entered in all uppercase.

    Consistency in capitalisation is important, as the same value appearing more than once in different casing may be considered different values altogether. For simplicity, we suggest sticking to one capitalisation format.

  • Inconsistent spacing in column values. This could be entering “hello” without a space at the end, and “hello “ with a space at the end, for example.

Additionally, your data should be kept tidy, a concept introduced in Hadley Wickham’s “Tidy Data”, in . Ultimately, clearer organisation makes it easier for users to understand and use your data.

Drawing from these principles, you should ensure:

  • Each column uses the same unit of measurement

  • Each row makes one observation. This is all the variable information collected on a single subject or participant. For example, if your dataset looks at people, one row might contain a single person’s height, weight, and age. While the actual measurements of different people may vary from row to row, what you are measuring in each row shouldn’t.

  • Each table only has one type of observation unit. If your data looks at the average height of a population, you could have one observation unit as the overall population, and another as the population of different genders. This means you might have one table for overall average height, and one for average height of different genders.

  1. Granularity and precision

As far as possible, data should be raw and granular instead of aggregated and processed, such as the use of percentages.

Totals and sub-totals should be in separate tables if needed. For example, there are cases where aggregate numbers, such as totals and indices, cannot be derived from granular data points.

Get in touch

Our guidelines are always up for review to give our users the best experience possible. Have more feedback or questions? and we will reach out to you as soon as we can.

The Journal of Statistical Software
Contact us