Build a Template Data Dictionary

Use Case Example - Steps to build a Template Data Dictionary

This use case offers steps to build a Template Data Dictionary using the Claravine API, containing all field names and their types, as well as all possible list values for list fields. The output is a CSV file in this example.

Claravine API used in this use case:

The output of the data dictionary sample application will be as followed (displayed as a transposed CSV):

Template NameDessert Menu
Field NameClaravine IDDessert CategoryDessert FoodName of the Dessert FoodPrice of Dessert Food
Field Typecv_idlistlisttexttext
List ID1039610396
List NameCategoryCategory
List Last Updated At2025-03-08T10:11:33+00:002025-03-08T10:11:33+00:00
List Column ID5264152640
List Column NameCategoryFood
value_1BreadsBreakfast
value_2EggiesDessert
value_3PiesLunch
value_4Salads
value_5Sandwiches
value_6Sides
value_7Sweets

A customer can follow these steps (script available below), and run them at a frequency based on business needs (ex. daily, weekly, monthly):

  1. First, select the Template you intend to use to build your Template Data Dictionary. Note that a template is called a dataset in the Claravine API.
  2. Then, call the "List Datasets" endpoint (GET /v1/datasets/) to fetch all datasets from your account. Look for the name of the dataset you intend to use and copy its UUID, right below the name.
  3. Now that you have the dataset_uuid, use it to call the "Dataset Definition" endpoint (GET /v1/datasets/{dataset_uuid}/definition) endpoint to information about all fields being used by the dataset.
  4. For each list field returned above, use the "Get Pick List" endpoint (GET /v1/lists/{listId}) endpoint to get each list information in a JSON format.
  5. Process all lists and filter only those that are columns in the template.
  6. Append the list values into their respective fields in the CSV file.
  7. Write the output of each field (name and type) to the CSV file.
  8. Upload the generated CSV file to Cloud Storage, or where your team can access globally, or where your workflow/application can consume it.

Python script to create a Template Data Dictionary

  • Note: Before attempting to run the script, make sure you have the following libraries installed:
  • In order to run the script you need to edit the first code lines and add values for:
    • DOMAIN (Account domain, e.g. DOMAIN = "your_account.claravine.com")
    • API_KEY (Claravine Public API Key)
    • API_SECRET (Claravine Public API Secret)
  • Then save the following code into a file (e.g. "script_data_dictionary.py") and execute in two different ways:
    • Option 1: python script_data_dictionary.py (no dataset_uuid provided). The script will display all template names in a menu where the user can select the desired template.
    • Option 2: python script_data_dictionary.py "your_dataset_UUID"(dataset_uuid as a command-line argument if your application previously queried a template UUID).
  • The script will generate a transposed CSV file when opening the raw CSV file, but once the file is opened in excel, google sheets and etc it will be easy to read the Template Data Dictionary.
import os
import sys
import csv
import requests
import pandas as pd
from datetime import datetime
from simple_term_menu import TerminalMenu

'''TODO: Edit the following lines to use your values'''
DOMAIN = '' # Account domain (e.g. 'account1.claravine.com')
API_KEY = '' # Claravine API Key
API_SECRET = '' # Claravine API Secret
'''End of lines to be eddited'''

# Set headers and URL
HEADERS = {
    'accept': 'application/json',
    'x-claravine-key': API_KEY,
    'x-claravine-secret': API_SECRET
}

# Fetch all list values from columns in template
def get_all_list_data(template_data, domain, headers):
    list_column_values = dict()
    list_metadata = dict()
    
    # First, we fetch all list_column_ids that are being used and group by their list_id
    for field in template_data['template_fields']:
        if field['field']['list_governance']:
            list_id = field['field']['list_governance']['list_id']
            if list_id not in list_column_values: list_column_values[list_id] = dict()
            col_id = field['field']['list_governance']['list_column_id']
            list_column_values[list_id][col_id] = set()
    
    # Then, we call GET v1/lists/{listId} for each list, to fetch all list values and filter by the columns being used by the template
    for list_id, column_ids in list_column_values.items():
        url = f"https://{domain}/v1/lists/{list_id}/"
        response = requests.get(url, headers=headers)
        if response.status_code != 200:
            error_msg = f"[ERROR] An unexpected error happened during the GET {url} request. Message:", response.text
            raise RuntimeError(error_msg)

        list_data = response.json()
        for row in list_data['rows']:
            for cell in row['row_data']:
                if (cell['col_id'] in list_column_values[list_id]):
                    list_column_values[list_id][cell['col_id']].add(cell['value'])

        # Store list metadata
        list_metadata[list_id] = {
            'name': list_data.get('name', ''),
            'updated_at': list_data.get('updated_at', ''),
            'columns': {header['col_id']: header['name'] 
                        for header in list_data.get('headers', [])}
        }
    return list_column_values, list_metadata


# Call GET /v1/datasets/ endpoint to retrieve all templates names and UUIDs
def fetch_all_templates():
    datasets_url = f"https://{DOMAIN}/v1/datasets/"
    datasets_response = requests.request("GET", datasets_url, headers=HEADERS)

    if (datasets_response.status_code != 200):
        error_msg = f"[ERROR] An unexpected error happened during the GET {datasets_url} request. Message:", datasets_response.text
        raise RuntimeError(error_msg)

    # Filter only names and UUIDs
    return [[template['name'], template['uuid']] for template in datasets_response.json()['datasets']]


# Pull all templates avaiable and display in menu so user can select it from terminal
def select_from_terminal_menu(): 
    templates_available = fetch_all_templates()

    # Create formatted display strings
    menu_entries = [
        f"Name: '{name}', UUID: '{uuid}'" 
        for name, uuid in templates_available
    ]
    # Create a mapping from display string to UUID
    display_to_uuid = {
        entry: uuid 
        for entry, (_, uuid) in zip(menu_entries, templates_available)
    }
    terminal_menu = TerminalMenu(
        menu_entries,  # Show formatted entries
        title="Type to filter ↓  |  ↑/↓ to navigate  |  Enter to select:",
        search_key=None,  # Enable search by typing any character
        preview_command="echo 'Selected Template: {}'",
        preview_size=0.3,
        cycle_cursor=True,
        clear_screen=True
    )
    
    selected_entry = terminal_menu.show()
    dataset_uuid = display_to_uuid[menu_entries[selected_entry]]
    return dataset_uuid


# Creates default file path to store CSV
def create_default_file_path():
    timestamp = datetime.now().strftime("%m_%d_%Y-%Hh%M")
    output_dir = os.path.join(os.getcwd(), 'data_dictionary')
    os.makedirs(output_dir, exist_ok=True)  # Creates the directory path if needed
    output_file_path = os.path.join(output_dir, f"data_dictionary_{dataset_uuid}-{timestamp}.csv")
    return output_file_path



if __name__ == '__main__':
    if (DOMAIN == '' or API_KEY == '' or API_SECRET == ''):
        error_msg = f"[ERROR] Please edit the first lines of the script and provide values for: 'DOMAIN', 'API_KEY' and 'API_SECRET'."
        raise RuntimeError(error_msg)

    # Get dataset UUID 
    try: 
        dataset_uuid = sys.argv[1]
    except IndexError:
        dataset_uuid = select_from_terminal_menu()
    print(f"Selected dataset UUID: {dataset_uuid}")

    output_file_path = None  # NOTE: (Optional) You can edit this line to specify the path folder/name where you want the CSV to be created. 
    if not output_file_path:
        # Otherwise, the CSV file will be created in: <current_dir>/data_dictionary/data_dictionary_<uuid>-<timestamp>.csv
        output_file_path = create_default_file_path()
        print(f"[WARNING] Using default filepath, the result Data Dictionary CSV file will be created at: {output_file_path}")


    # 1. Call GET /v1/datasets/{templateUuid}/definition endpoint to retrieve all template fields
    url_template_definition = f"https://{DOMAIN}/v1/datasets/{dataset_uuid}/definition"
    template_definition = requests.request("GET", url_template_definition, headers=HEADERS)
    if (template_definition.status_code != 200):
        error_msg = f"[ERROR] An unexpected error happened during the GET {url_template_definition} request. Message:", template_definition.text
        raise RuntimeError(error_msg)
    template_data = template_definition.json()

    # 2. Fetch list values from all list columns that are used by the current template
    list_columns_values, list_metadata = get_all_list_data(template_data, DOMAIN, HEADERS)
    template_name = template_data['name']
    num_fields = len(template_data['template_fields'])

    # 3. Build initial structure for horizontal CSV oriented
    csv_data = {
        'Template Name': [template_name] + [''] * (num_fields - 1),
        'Field Name': [field['field']['name'] for field in template_data['template_fields']],
        'Field Type': [
            field['field']['governance_type'] or field.get('governance_type', '') 
            for field in template_data['template_fields']
        ],
        'List ID': [field['field']['list_governance']['list_id'] 
                if field['field']['list_governance'] else '' 
                for field in template_data['template_fields']],
        'List Name': [list_metadata.get(field['field']['list_governance']['list_id'], {}).get('name', '')
                    if field['field']['list_governance'] else ''
                    for field in template_data['template_fields']],
        'List Last Updated At': [list_metadata.get(field['field']['list_governance']['list_id'], {}).get('updated_at', '')
                    if field['field']['list_governance'] else ''
                    for field in template_data['template_fields']],
        'List Column ID': [field['field']['list_governance']['list_column_id'] 
                        if field['field']['list_governance'] else '' 
                        for field in template_data['template_fields']],
        'List Column Name': [list_metadata.get(field['field']['list_governance']['list_id'], {})
                            .get('columns', {}).get(field['field']['list_governance']['list_column_id'], '')
                            if field['field']['list_governance'] else ''
                            for field in template_data['template_fields']]
    }

    # 4. Determine the list with the biggest lenght to correctly arrange the horizontal CSV columns
    max_values = max(len(i) for list_columns in list_columns_values.values() for i in list_columns.values()) if (list_columns_values) else 0

    # 5. Build each CSV row to add list values
    for row_num in range(max_values):
        csv_value_row = []
        for i, field in enumerate(template_data['template_fields']):
            if field['field']['list_governance']:
                list_id = field['field']['list_governance']['list_id']
                col_id = field['field']['list_governance']['list_column_id']
                values = list(sorted(list_columns_values[list_id][col_id]))
                csv_value_row.append(values[row_num] if row_num < len(values) else '')
            else:
                csv_value_row.append('')
        csv_data[f'value_{row_num}'] = csv_value_row

    # 6. Use pandas dataframe to transpose csv_data
    df = pd.DataFrame(csv_data)
    transposed_df = df.T

    # Convert vertial CSV headers into horizontal row names
    row_names = [
        "Template Name",
        "Field Name", 
        "Field Type",
        "List ID",
        "List Name",
        "List Last Updated At",
        "List Column ID",
        "List Column Name"
    ] + [f"value_{i+1}" for i in range(max_values)]
    transposed_df.index = row_names[:len(transposed_df)]

    # Save to CSV
    transposed_df.to_csv(output_file_path, header=False)
    
    print("Finished processing all template fields and list values. Data Dictionary can be seen at: " + output_file_path)