# Write APIs

Government Admin users can use Write APIs to automatically and programmatically publish data to data.gov.sg without needing to log into the UI

***

## Prerequisites

{% hint style="info" %}
If you're testing on staging, use the [https://api-staging.data.gov.sg/](https://api-production.data.gov.sg/v2/admin/api/auth/whoami) domain instead of the [https://api-production.data.gov.sg/](https://api-production.data.gov.sg/v2/admin/api/auth/whoami) domain listed in the documentation below
{% endhint %}

* You must be a government officer
* Your email has admin permissions to manage datasets

## Step 1 - Generate API Keys:

* Create an API key from the [admin dashboard](https://beta.data.gov.sg/admin/api-keys).
* Store the API key in a secure location. It will be used later.
* Step by step details can be found here: [how-to-generate-api-keys](https://guide.data.gov.sg/user-guide/for-data-owners/how-to-generate-api-keys "mention")

## Step 2 - Verify API Connectivity:

{% openapi src="<https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2Fy5CeK0CZ9cQPNPXSTU0G%2FOpenAPI%20Specification.yaml?alt=media&token=bce1df24-e860-4669-b712-1abfc8dec74f>" path="/v2/admin/api/auth/whoami" method="get" %}
[OpenAPI Specification.yaml](https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2Fy5CeK0CZ9cQPNPXSTU0G%2FOpenAPI%20Specification.yaml?alt=media\&token=bce1df24-e860-4669-b712-1abfc8dec74f)
{% endopenapi %}

Run a get request against the above endpoint, using your admin api key in the header:

<https://api-production.data.gov.sg/v2/admin/api/auth/whoami>

\
A 200 response means that you're connected.

**Common issues:**

1. Check your API URL - did you use the correct URL above?
2. Check your API key - did you use the correct API key? Try regenerating the API key and testing again
3. Are the network configurations set up to connect to the internet? data.gov.sg is an internet platform and require internet connectivity

## Step 3 - Publish Dataset:

* To push data, an existing dataset must exist. The dataset can be created from the admin dashboard.
* The dataset must already contain existing data.
* The dataset must be published.
* For more details on publishing data: [how-to-publish-data](https://guide.data.gov.sg/user-guide/for-data-owners/how-to-publish-data "mention")

Using Write APIs involves three steps:

* Generating upload link
* Uploading file to link
* Polling for upload status

## Step 4 - Create Upload / Append URL:

There are two types of links you can create depending on your use case:

{% hint style="info" %}
Important notes:

1. The maximum file size is 1GB.&#x20;
   1. If you want to have a dataset that is more than 1GB, use the `Get Create Append URL`  and ensure each file uploaded is less than 1GB.

2. The pre-signed url generated will expire after 1 hour
   {% endhint %}

3. If you're replacing an existing dataset completely, use the `Get Create Upload URL` method

4. If you are appending rows to an existing dataset, use the `Get Create Append URL` method.&#x20;
   1. Please ensure the column data types and names are all the same and no new columns have been added. This append API only supports appending rows to an existing dataset and not appending columns

{% openapi src="<https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2FPFep58QSVCNi4STG6Kwz%2Fv2-write-api-specification_20240216.yml?alt=media&token=eb292347-599e-418e-8f8a-13e486028524>" path="/v2/admin/api/datasets/{datasetId}/upload-link" method="get" %}
[v2-write-api-specification\_20240216.yml](https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2FPFep58QSVCNi4STG6Kwz%2Fv2-write-api-specification_20240216.yml?alt=media\&token=eb292347-599e-418e-8f8a-13e486028524)
{% endopenapi %}

{% openapi src="<https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2FymtmZMtrRsaocs1Xlsaa%2Fappend-api-specification.yml?alt=media&token=e919ad78-e172-434e-8885-d42cb6ea4778>" path="/v2/admin/api/datasets/{datasetId}/append-link" method="get" %}
[append-api-specification.yml](https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2FymtmZMtrRsaocs1Xlsaa%2Fappend-api-specification.yml?alt=media\&token=e919ad78-e172-434e-8885-d42cb6ea4778)
{% endopenapi %}

{% hint style="info" %}
Find the `datasetId` within the dataset url\
\
For example, for this url: <https://beta.data.gov.sg/datasets/d_07c63be0f37e6e59c07a4ddc2fd87fcb/view>\
\
The datasetId is the following:\
d\_07c63be0f37e6e59c07a4ddc2fd87fcb\
\
Hence, the endpoint used would be:\
<https://api-production.data.gov.sg/v2/admin/api/datasets/d_07c63be0f37e6e59c07a4ddc2fd87fcb/upload-link>
{% endhint %}

## Step 5 - Upload Tabular File:

You'll need to make a direct `PUT` request to the S3 pre-signed url returned in the response of the previous step to upload your CSV file.&#x20;

The url should start with `https://s3.ap-southeast-1.amazonaws.com/`&#x20;

For example: `https://s3.ap-southeast-1.amazonaws.com/attachments.data.gov.sg/c/6047/private/d_2a6070fa301695904e1a626434189e59/VW50aXRsZWQgRGF0YXNldA/d_2a6070fa301695904e1a626434189e59.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAU7LWPY2WH4V4RGEJ%2F20231129%2Fap-southeast-1%2Fs3%2Faws4_request&X-Amz-Date=20231129T053810Z&X-Amz-Expires=3600&...`

**Javascript**

```javascript
// Upload CSV file to presigned URL
const uploadCsv = async (presignedUrl, csvContent) => {
  const response = await fetch(presignedUrl, {
    method: 'PUT',
    headers: {
      'Content-Type': 'text/csv'
    },
    body: csvContent
  });
  
  if (response.ok) {
    console.log('Upload successful!');
  } else {
    console.error('Upload failed:', response.statusText);
  }
};

// Usage
const csvData = `name,title,email
janedoe,SWE,jane@open.gov.sg
johnsmith,PM,john@open.gov.sg`;

await uploadCsv(presignedUrl, csvData);
```

**Python**

```python
import requests

# Upload CSV file to presigned URL
def upload_csv(presigned_url, csv_content):
    headers = {'Content-Type': 'text/csv'}
    
    response = requests.put(presigned_url, headers=headers, data=csv_content)
    response.raise_for_status()
    
    return response

# Usage
csv_data = """name,title,email
janedoe,SWE,jane@open.gov.sg
johnsmith,PM,john@open.gov.sg"""

upload_response = upload_csv(presigned_url, csv_data)
print("Upload successful!" if upload_response.status_code == 200 else "Upload failed")
```

{% hint style="info" %}
It’s important to ensure that you are sending a PUT request with only the file data. While POST-ing form-data is allowed, the file received will not be the same and may cause errors.
{% endhint %}

**Common issues:**

1. Check that you're using a PUT request
2. Check that you're using the generated S3 pre-signed url, instead of the one above
3. Set the header as `Content-Type: text/csv` when uploading the CSV file
4. The pre-signed URLs expire after 1 hour, so you'd need to generate them again

**Reference:**

{% embed url="<https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html>" %}

## Step 6 - Get Latest Ingestion Status:

{% openapi src="<https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2FPFep58QSVCNi4STG6Kwz%2Fv2-write-api-specification_20240216.yml?alt=media&token=eb292347-599e-418e-8f8a-13e486028524>" path="/v2/admin/api/datasets/{datasetId}/ingestion-status" method="get" %}
[v2-write-api-specification\_20240216.yml](https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2FPFep58QSVCNi4STG6Kwz%2Fv2-write-api-specification_20240216.yml?alt=media\&token=eb292347-599e-418e-8f8a-13e486028524)
{% endopenapi %}

The validation and ingestion process of data.gov.sg is asynchronous. The upload request from the previous section will return before the process is complete. In order to get the status for validation and ingestion, manual polling is required.

Here are the possible validation and ingestion statuses:

<table><thead><tr><th width="93">Order</th><th width="123">Status</th><th>Description</th></tr></thead><tbody><tr><td>1</td><td>Pending</td><td>The validation and ingestion process has been started</td></tr><tr><td>2</td><td>Pending Validation</td><td>The dataset has been queued for validation</td></tr><tr><td>2.1</td><td>Validation Failed</td><td>The dataset contains invalid values causing validation failures. Please review the reported errors, make the necessary changes and try again.</td></tr><tr><td>2.2</td><td>Validation Passed</td><td>The dataset has been validated successfully, but not yet ingested.</td></tr><tr><td>3</td><td>Pending Ingestion</td><td>The dataset has been queued for ingestion.</td></tr><tr><td>3.1</td><td>Ingestion Failed</td><td>The dataset failed to be ingested. Please try again later.</td></tr><tr><td>3.2</td><td>Ingestion Success</td><td>The dataset has been ingested successfully.</td></tr></tbody></table>

**Common questions:**

1. What happens when there are concurrent writes?
   1. If there is an existing dataset upload that has not completed, subsequent dataset uploads will automatically fail. This is to prevent multiple writes happening at the same time, that will cause unexpected behaviour
   2. To prevent this, poll this endpoint and wait for `Ingestion Success` status before uploading another file
2. What happens if there is one or more rows that fail validation?
   1. We currently take the conservative approach and will reject the dataset upload if any of the rows fail validation. In these cases, we recommend to fix the dataset before retrying.
   2. The API will not return detailed explanations on the validation error. For further debugging and investigation, please try uploading the dataset via the UI, which will show more explicit error messages to help with debugging.
3. How long does ingestion status take to update?
   1. Status updates typically within 1-5 minutes.

## Raw API Schema:

{% hint style="info" %}
Updated as of 20th June 2024
{% endhint %}

{% file src="<https://2014478147-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fksq6aNCGbng1fqooERfi%2Fuploads%2F3KdNfEfVCZyM74OV2qQr%2Fv2-write-api-specification.yml?alt=media&token=b117a597-b6b3-4dac-a05c-551758c3d53e>" %}

For further queries feel free to reach out to us: [Contact the Data.gov.sg team](https://form.gov.sg/6449e5c3664c1b001249acf1)
