Transcript CSV

Overview

If you don’t want to connect directly to a transcript/call recording datasource like Zoom, Google Meet, Grain, or Gong, you can export the transcripts into a CSV and upload into Artifact. This document provides guidance on how the CSV should be formatted: both the required fields and the optional fields. It also goes over the optional, supplemental person metadata CSV you can upload. Following this guide will lead to an easy and seamless upload process.

Sample Call Transcript CSV

Each of these fields will be covered in more depth, but here is an example of what the call transcript CSV looks like. This example includes the required fields (interaction_id through affiliation columns) and two optional fields (offset_start_time and offset_end_time). No extra metadata fields were added to the CSV.

Transcript CSV Guidelines

Required Columns

The transcript CSV you upload to Artifact must contain the following columns of data for Artifact to be able to synthesize the data and link customers together across interactions.

interaction_id: This is an id that is inclusive of all of the back and forth between a customer and a company. For transcripts, this is often the call_id or recording_id. The column is required and the value can’t be null, an empty string, or contain any spaces. If your data doesn’t have a unique ID for every complete interaction, you’ll need to create one.
verbatim_sequence_num: This specifies the order of the text within the transcript. For each transcript, the verbatim_sequence_num should start at 1. This column is required and values can't be null, an empty string, or contain a space. The pairing of the interaction_id and the verbatim_sequence_num should be unique.
text: This is the text – what the customer said in that part of the conversation. The column is required and the field can’t be null or an empty string.
person_id: This is the ID of the person who said the text. This column is required and the field can’t be null or an empty string. If your data does not include a person_id, you’ll need to create a unique ID for each person.
person_full_name: This is the speaker’s name. The column is required, but the fields can be left empty. If left empty, when looking at verbatims in the app, it will be attributed to “unknown.”
person_email: This is the speaker’s email address. This column is required, but can be left empty. If possible, please provide as it helps us to identify a user across all datasources.
affiliation: This describes if the speaker was a customer (EXTERNAL), or someone from within the company (INTERNAL), or from an automation bot (AUTOMATION). This column is required and the field needs to contain either EXTERNAL, INTERNAL, or AUTOMATION. The value can’t be null or an empty string.
created_at: This is the timestamp for when the monologue was created. The column is required and the field can’t be null. The format should be in ISO 8601 standard, as follows: 1990-01-15T00:34:60.026490+09:30.
offset_start_time: This is the starting time of the monologue in the interaction, measured from the start of the interaction. It should be a decimal number, measured in millseconds, to allow for precision. This column is optional. (Example: 12.34 - which indicates that the monologue starts 12.34 seconds into the interaction.)
offset_end_time: This is the ending time of the monologue in the interaction, measured from the start of the interaction. It should be a decimal number, measured in milliseconds to allow for precision. This column is optional. (Example: 01.04.51 - which indicates that the monologue ends 64.51 seconds into the interaction.)

Optional transcript metadata columns

If there is additional information about the transcript (but not the customer) that you want to filter by in Artifact, you can add columns with this information. For example, if I wanted to filter by tags, satisfaction rating, priority, or another value, I would add columns, like these to the CSV:

tag	satisfaction_rating	priority
login error	4	medium

When going through the CSV upload process, these columns will be left as unmapped.

Optional person metadata CSV

If there is additional information about the customer that you want to filter by in Artifact, this should NOT be added to the transcript CSV. Instead, once the transcript CSV has been uploaded, there will be a prompt to add a person metadata CSV. Think of this like a separate customer table in the same datasource.

Person metadata CSV required fields

person_id: This should be the same person_id that is used in the transcript CSV. The column is required and the value can’t be null, an empty string, or contain any spaces.
person_email: This is the email of the person. The column is required, but the value can be null or an empty string. Please note that this helps us conduct identify resolution across the different data sources so if possible, please provide it.
created_at: This is when the person was first seen or created within the transcript data. The column is required and the field can’t be null. The format should be in ISO 8601 standard, as follows: 1990-01-15T00:34:60.026490+09:30.
updated_at: This is the timestamp for the last time the person was updated. The column is required and the field can’t be null. The format should be in ISO 8601 standard, as follows: 1990-01-15T00:34:60.026490+09:30. If no updates have happened, then the value would be the same as the created_at field.

Optional metadata fields

After the required columns, any metadata that you want to filter on can be added. Some examples of person metadata that you would want to filter on include customer_state, customer_country, role, segment, or last_seen.

Important considerations for the transcript CSV

The number of interactions in the CSV will be subtracted from your monthly amount.
Person data doesn’t affect your monthly interactions.
Don’t upload interaction data older than your look back period.