# Data principles

While clear titles and descriptions are important, your data lies at the core of what users are looking for. To provide users with data that is easy and reliable to use, we have a few guiding principles.

1. **Ensure machine readability**

Data should be in a format a computer can understand. This means relevant fields can be extracted and parsed without human input. As an open data collective, any data you upload must also be in non-proprietary file formats.

2. **Check for errors and inconsistencies**

Your data should be free from any errors and inconsistencies! This will go a long way in establishing trust, protecting the integrity of your data, and allowing mashup or use of multiple datasets, even across agencies.

We strongly encourage correcting any issues that negatively affect data quality, such as:

* Null values on columns. If all values in a column are null, consider removing the column.
* Duplicate rows
* Outliers. While having many outliers is not necessarily a cause for concern, there should not be impossible values appearing. This might look like negative values appearing in columns that should only have positive ones, for example.
* Inconsistent capitalisation in datasets. For example, if a value in a column is entered in all uppercase, all other values in that same column should also be entered in all uppercase.

  Consistency in capitalisation is important, as the same value appearing more than once in different casing may be considered different values altogether. For simplicity, we suggest sticking to one capitalisation format.
* Inconsistent spacing in column values. This could be entering “hello” without a space at the end, and “hello “ with a space at the end, for example.

Additionally, your data should be kept *tidy*, a concept introduced in Hadley Wickham’s “Tidy Data”, in [*The Journal of Statistical Software*](https://vita.had.co.nz/papers/tidy-data.html)*.* Ultimately, clearer organisation makes it easier for users to understand and use your data.

Drawing from these principles, you should ensure:

* Each column uses the same unit of measurement
* Each row makes one observation. This is all the variable information collected on a single subject or participant. For example, if your dataset looks at people, one row might contain a single person’s height, weight, and age. While the actual measurements of different people may vary from row to row, what you are measuring in each row shouldn’t.
* Each table only has one type of observation unit. If your data looks at the average height of a population, you could have one observation unit as the overall population, and another as the population of different genders. This means you might have one table for overall average height, and one for average height of different genders.

3. **Granularity and precision**

As far as possible, data should be raw and granular instead of aggregated and processed, such as the use of percentages.

Totals and sub-totals should be in separate tables if needed. For example, there are cases where aggregate numbers, such as totals and indices, cannot be derived from granular data points.

## Get in touch

Our guidelines are always up for review to give our users the best experience possible. Have more feedback or questions? [Contact us](https://form.gov.sg/6449e5c3664c1b001249acf1) and we will reach out to you as soon as we can.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://guide.data.gov.sg/user-guide/content-quality-guidelines/data-principles.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
