Build a Template Data Dictionary
Use Case Example - Steps to build a Template Data Dictionary
This use case offers steps to build a Template Data Dictionary using the Claravine API, containing all field names and their types, as well as all possible list values for list fields. The output is a CSV file in this example.
Claravine API used in this use case:
The output of the data dictionary sample application will be as followed (displayed as a transposed CSV):
Template Name | Dessert Menu | ||||
---|---|---|---|---|---|
Field Name | Claravine ID | Dessert Category | Dessert Food | Name of the Dessert Food | Price of Dessert Food |
Field Type | cv_id | list | list | text | text |
List ID | 10396 | 10396 | |||
List Name | Category | Category | |||
List Last Updated At | 2025-03-08T10:11:33+00:00 | 2025-03-08T10:11:33+00:00 | |||
List Column ID | 52641 | 52640 | |||
List Column Name | Category | Food | |||
value_1 | Breads | Breakfast | |||
value_2 | Eggies | Dessert | |||
value_3 | Pies | Lunch | |||
value_4 | Salads | ||||
value_5 | Sandwiches | ||||
value_6 | Sides | ||||
value_7 | Sweets |
A customer can follow these steps (script available below), and run them at a frequency based on business needs (ex. daily, weekly, monthly):
- First, select the Template you intend to use to build your Template Data Dictionary. Note that a template is called a dataset in the Claravine API.
- Then, call the "List Datasets" endpoint (
GET /v1/datasets/
) to fetch all datasets from your account. Look for the name of the dataset you intend to use and copy its UUID, right below the name. - Now that you have the
dataset_uuid
, use it to call the "Dataset Definition" endpoint (GET /v1/datasets/{dataset_uuid}/definition
) endpoint to information about all fields being used by the dataset. - For each list field returned above, use the "Get Pick List" endpoint (
GET /v1/lists/{listId}
) endpoint to get each list information in a JSON format. - Process all lists and filter only those that are columns in the template.
- Append the list values into their respective fields in the CSV file.
- Write the output of each field (name and type) to the CSV file.
- Upload the generated CSV file to Cloud Storage, or where your team can access globally, or where your workflow/application can consume it.
Python script to create a Template Data Dictionary
- Note: Before attempting to run the script, make sure you have the following libraries installed:
requests
(https://pypi.org/project/requests/)pandas
(https://pandas.pydata.org/docs/getting_started/install.html)simple_term_menu
(https://pypi.org/project/simple-term-menu/)
- In order to run the script you need to edit the first code lines and add values for:
DOMAIN
(Account domain, e.g.DOMAIN = "your_account.claravine.com"
)API_KEY
(Claravine Public API Key)API_SECRET
(Claravine Public API Secret)
- Then save the following code into a file (e.g. "script_data_dictionary.py") and execute in two different ways:
- Option 1:
python script_data_dictionary.py
(nodataset_uuid
provided). The script will display all template names in a menu where the user can select the desired template. - Option 2:
python script_data_dictionary.py "your_dataset_UUID"
(dataset_uuid
as a command-line argument if your application previously queried a template UUID).
- Option 1:
- The script will generate a transposed CSV file when opening the raw CSV file, but once the file is opened in excel, google sheets and etc it will be easy to read the Template Data Dictionary.
import os
import sys
import csv
import requests
import pandas as pd
from datetime import datetime
from simple_term_menu import TerminalMenu
'''TODO: Edit the following lines to use your values'''
DOMAIN = '' # Account domain (e.g. 'account1.claravine.com')
API_KEY = '' # Claravine API Key
API_SECRET = '' # Claravine API Secret
'''End of lines to be eddited'''
# Set headers and URL
HEADERS = {
'accept': 'application/json',
'x-claravine-key': API_KEY,
'x-claravine-secret': API_SECRET
}
# Fetch all list values from columns in template
def get_all_list_data(template_data, domain, headers):
list_column_values = dict()
list_metadata = dict()
# First, we fetch all list_column_ids that are being used and group by their list_id
for field in template_data['template_fields']:
if field['field']['list_governance']:
list_id = field['field']['list_governance']['list_id']
if list_id not in list_column_values: list_column_values[list_id] = dict()
col_id = field['field']['list_governance']['list_column_id']
list_column_values[list_id][col_id] = set()
# Then, we call GET v1/lists/{listId} for each list, to fetch all list values and filter by the columns being used by the template
for list_id, column_ids in list_column_values.items():
url = f"https://{domain}/v1/lists/{list_id}/"
response = requests.get(url, headers=headers)
if response.status_code != 200:
error_msg = f"[ERROR] An unexpected error happened during the GET {url} request. Message:", response.text
raise RuntimeError(error_msg)
list_data = response.json()
for row in list_data['rows']:
for cell in row['row_data']:
if (cell['col_id'] in list_column_values[list_id]):
list_column_values[list_id][cell['col_id']].add(cell['value'])
# Store list metadata
list_metadata[list_id] = {
'name': list_data.get('name', ''),
'updated_at': list_data.get('updated_at', ''),
'columns': {header['col_id']: header['name']
for header in list_data.get('headers', [])}
}
return list_column_values, list_metadata
# Call GET /v1/datasets/ endpoint to retrieve all templates names and UUIDs
def fetch_all_templates():
datasets_url = f"https://{DOMAIN}/v1/datasets/"
datasets_response = requests.request("GET", datasets_url, headers=HEADERS)
if (datasets_response.status_code != 200):
error_msg = f"[ERROR] An unexpected error happened during the GET {datasets_url} request. Message:", datasets_response.text
raise RuntimeError(error_msg)
# Filter only names and UUIDs
return [[template['name'], template['uuid']] for template in datasets_response.json()['datasets']]
# Pull all templates avaiable and display in menu so user can select it from terminal
def select_from_terminal_menu():
templates_available = fetch_all_templates()
# Create formatted display strings
menu_entries = [
f"Name: '{name}', UUID: '{uuid}'"
for name, uuid in templates_available
]
# Create a mapping from display string to UUID
display_to_uuid = {
entry: uuid
for entry, (_, uuid) in zip(menu_entries, templates_available)
}
terminal_menu = TerminalMenu(
menu_entries, # Show formatted entries
title="Type to filter ↓ | ↑/↓ to navigate | Enter to select:",
search_key=None, # Enable search by typing any character
preview_command="echo 'Selected Template: {}'",
preview_size=0.3,
cycle_cursor=True,
clear_screen=True
)
selected_entry = terminal_menu.show()
dataset_uuid = display_to_uuid[menu_entries[selected_entry]]
return dataset_uuid
# Creates default file path to store CSV
def create_default_file_path():
timestamp = datetime.now().strftime("%m_%d_%Y-%Hh%M")
output_dir = os.path.join(os.getcwd(), 'data_dictionary')
os.makedirs(output_dir, exist_ok=True) # Creates the directory path if needed
output_file_path = os.path.join(output_dir, f"data_dictionary_{dataset_uuid}-{timestamp}.csv")
return output_file_path
if __name__ == '__main__':
if (DOMAIN == '' or API_KEY == '' or API_SECRET == ''):
error_msg = f"[ERROR] Please edit the first lines of the script and provide values for: 'DOMAIN', 'API_KEY' and 'API_SECRET'."
raise RuntimeError(error_msg)
# Get dataset UUID
try:
dataset_uuid = sys.argv[1]
except IndexError:
dataset_uuid = select_from_terminal_menu()
print(f"Selected dataset UUID: {dataset_uuid}")
output_file_path = None # NOTE: (Optional) You can edit this line to specify the path folder/name where you want the CSV to be created.
if not output_file_path:
# Otherwise, the CSV file will be created in: <current_dir>/data_dictionary/data_dictionary_<uuid>-<timestamp>.csv
output_file_path = create_default_file_path()
print(f"[WARNING] Using default filepath, the result Data Dictionary CSV file will be created at: {output_file_path}")
# 1. Call GET /v1/datasets/{templateUuid}/definition endpoint to retrieve all template fields
url_template_definition = f"https://{DOMAIN}/v1/datasets/{dataset_uuid}/definition"
template_definition = requests.request("GET", url_template_definition, headers=HEADERS)
if (template_definition.status_code != 200):
error_msg = f"[ERROR] An unexpected error happened during the GET {url_template_definition} request. Message:", template_definition.text
raise RuntimeError(error_msg)
template_data = template_definition.json()
# 2. Fetch list values from all list columns that are used by the current template
list_columns_values, list_metadata = get_all_list_data(template_data, DOMAIN, HEADERS)
template_name = template_data['name']
num_fields = len(template_data['template_fields'])
# 3. Build initial structure for horizontal CSV oriented
csv_data = {
'Template Name': [template_name] + [''] * (num_fields - 1),
'Field Name': [field['field']['name'] for field in template_data['template_fields']],
'Field Type': [
field['field']['governance_type'] or field.get('governance_type', '')
for field in template_data['template_fields']
],
'List ID': [field['field']['list_governance']['list_id']
if field['field']['list_governance'] else ''
for field in template_data['template_fields']],
'List Name': [list_metadata.get(field['field']['list_governance']['list_id'], {}).get('name', '')
if field['field']['list_governance'] else ''
for field in template_data['template_fields']],
'List Last Updated At': [list_metadata.get(field['field']['list_governance']['list_id'], {}).get('updated_at', '')
if field['field']['list_governance'] else ''
for field in template_data['template_fields']],
'List Column ID': [field['field']['list_governance']['list_column_id']
if field['field']['list_governance'] else ''
for field in template_data['template_fields']],
'List Column Name': [list_metadata.get(field['field']['list_governance']['list_id'], {})
.get('columns', {}).get(field['field']['list_governance']['list_column_id'], '')
if field['field']['list_governance'] else ''
for field in template_data['template_fields']]
}
# 4. Determine the list with the biggest lenght to correctly arrange the horizontal CSV columns
max_values = max(len(i) for list_columns in list_columns_values.values() for i in list_columns.values()) if (list_columns_values) else 0
# 5. Build each CSV row to add list values
for row_num in range(max_values):
csv_value_row = []
for i, field in enumerate(template_data['template_fields']):
if field['field']['list_governance']:
list_id = field['field']['list_governance']['list_id']
col_id = field['field']['list_governance']['list_column_id']
values = list(sorted(list_columns_values[list_id][col_id]))
csv_value_row.append(values[row_num] if row_num < len(values) else '')
else:
csv_value_row.append('')
csv_data[f'value_{row_num}'] = csv_value_row
# 6. Use pandas dataframe to transpose csv_data
df = pd.DataFrame(csv_data)
transposed_df = df.T
# Convert vertial CSV headers into horizontal row names
row_names = [
"Template Name",
"Field Name",
"Field Type",
"List ID",
"List Name",
"List Last Updated At",
"List Column ID",
"List Column Name"
] + [f"value_{i+1}" for i in range(max_values)]
transposed_df.index = row_names[:len(transposed_df)]
# Save to CSV
transposed_df.to_csv(output_file_path, header=False)
print("Finished processing all template fields and list values. Data Dictionary can be seen at: " + output_file_path)
Updated about 1 month ago