Snowflake Schema Detection Feature for JSON Files

Data Engineering Simplified
5 min readAug 16, 2023

--

https://youtu.be/a4ndsoZ5Ox8

Data loading processes are crucial for timely insights and decision-making. Snowflake, has add its support for JSON using “Infer Schema” table function that streamlines the process of loading data from JSON files into Snowflake tables. We will explore the importance of this feature and how it empowers data engineers and analysts to expedite data loading while ensuring accuracy and consistency.

Check out my courses in Udemy. Popular course include building utility using Snowpark Python API to ingest CSV data as well as JSON data automatically without writing any DDL/DML statement and save 95% of manual effort.

Importance of Infer Schema in Snowflake:

Traditionally, defining table structure (DDL) while loading data can be a manual and error-prone task, especially when dealing with large datasets. With Snowflake’s “Infer Schema” feature for JSON, this tedious process is simplified. The feature automatically reads a JSON file from a specified stage location(s) and intelligently infers column names and data types based on the JSON file’s contents. This automation not only saves valuable time and effort but also reduces the risk of schema mismatches, ensuring data integrity throughout the data pipeline.

How to Use Infer Schema in Snowflake:

To utilize this powerful feature, data practitioners need to follow a few simple steps.

  1. Ensure that the JSON file is staged in Snowflake with appropriate permissions.
  2. Next, create a new table using the “CREATE TABLE” statement, and instead of specifying explicit column names and data types, use “INFER SCHEMA” as the column definition.
  3. Snowflake will automatically analyze the JSON data file(s) and create the table DDL accordingly.

Watch the video below to see the Infer Schema feature in action and discover how it can efficiency into your data loading workflows

Data Files

Download the JSON data files from this git location.

Part-1 : Create Database, Schema & Stage

-- create a database/schema.
create or replace database json_db;
create or replace schema my_schema;
use schema my_schema;

create or replace stage json_stg;

list @json_stg;

-- creating a json file format
create or replace file format json_file_format
type = 'JSON'
STRIP_OUTER_ARRAY = true;

create or replace file format json_file_format_outer_array_false
type = 'JSON'
STRIP_OUTER_ARRAY = false;

Part-2 : Infer-Schema with Simple JSON File

-- run infer schema and check the data type
select * from table(infer_schema(
location => '@json_stg/json/employee' ,
files => 'simple_emp_single_entity.json',
file_format=>'json_file_format',
ignore_case => true

));
{
"name": "Philip Ayala",
"address": "USCGC Gonzalez\nFPO AA 13268",
"email": "williamthompson@example.com",
"phone_number": [
"(824)948-6728","(824)948-6727"
],
"job": "IT",
"designation": "Software Engineer",
"date_of_joining": "2023-04-03",
"date_of_birth": "1983-06-19",
"age":30,
"height_in_ft": 6.22,
"active_status": true,
"programming_skill": {
"language": "Ruby",
"proficiency_level": "Beginner",
"experience_in_yrs": 2.6
},
"created_at": "2023-08-07T20:53:08.667555",
"updated_at": "2023-08-07T20:53:08.667555"
}

Part-2 : Infer-Schema with Array JSON File

-- lets run it for emp_json_with_array.json
select * from table(infer_schema(
location => '@json_stg/json/employee' ,
files => 'emp_json_with_array.json',
file_format=>'json_file_format_outer_array_false'
));
[
{
"name": "Steven Smith",
"address": "4116 Jeffery Stravenue\nSmithview, MN 28257",
"email": "annette03@example.net",
"phone_number": "(825)653-0970x0667",
"job": "Automotive engineer",
"designation": "Lobbyist",
"date_of_joining": "2020-06-08",
"date_of_birth": "2016-12-19",
"programming_skill": {
"language": "JavaScript",
"proficiency_level": "Expert",
"experience": null
},
"created_at": "2023-08-07T21:09:34.992679",
"updated_at": "2023-08-07T21:09:34.992679"
},
{
"name": "Christina Walsh",
"address": "84143 Christopher Station\nJamesland, FM 35789",
"email": "dianafisher@example.net",
"phone_number": "(528)912-8622x763",
"job": "Government social research officer",
"designation": "Medical technical officer",
"date_of_joining": "2021-10-31",
"date_of_birth": "1984-05-02",
"programming_skill": {
"language": "Python",
"proficiency_level": "Intermediate",
"experience": null
},
"created_at": "2023-08-07T21:09:34.993251",
"updated_at": "2023-08-07T21:09:34.993251"
}
]

Part-3 : Infer-Schema with Object JSON File

-- lets run it for emp_json_with_array.json
select * from table(infer_schema(
location => '@json_stg/json/employee' ,
files => 'emp_json_with_dic.json',
file_format=>'json_file_format'
));
{
"emp_1": {
"name": "Steven Smith",
"address": "4116 Jeffery Stravenue\nSmithview, MN 28257",
"email": "annette03@example.net",
"phone_number": "(825)653-0970x0667",
"job": "Automotive engineer",
"designation": "Lobbyist",
"date_of_joining": "2020-06-08",
"date_of_birth": "2016-12-19",
"programming_skill": {
"language": "JavaScript",
"proficiency_level": "Expert",
"experience": null
},
"created_at": "2023-08-07T21:09:34.992679",
"updated_at": "2023-08-07T21:09:34.992679"
},
"emp_2": {
"name": "Christina Walsh",
"address": "84143 Christopher Station\nJamesland, FM 35789",
"email": "dianafisher@example.net",
"phone_number": "(528)912-8622x763",
"job": "Government social research officer",
"designation": "Medical technical officer",
"date_of_joining": "2021-10-31",
"date_of_birth": "1984-05-02",
"programming_skill": {
"language": "Python",
"proficiency_level": "Intermediate",
"experience": null
},
"created_at": "2023-08-07T21:09:34.993251",
"updated_at": "2023-08-07T21:09:34.993251"
}
}

Part-4 : Infer-Schema with New Line JSON File

-- lets run it for emp_json_with_newline.json
select * from table(infer_schema(
location => '@json_stg/json/employee' ,
files => 'emp_json_with_comma.json',
file_format=>'json_file_format'
));
{
"name": "Steven Smith",
"address": "4116 Jeffery Stravenue\nSmithview, MN 28257",
"email": "annette03@example.net",
"phone_number": "(825)653-0970x0667",
"job": "Automotive engineer",
"designation": "Lobbyist",
"date_of_joining": "2020-06-08",
"date_of_birth": "2016-12-19",
"programming_skill": {
"language": "JavaScript",
"proficiency_level": "Expert",
"experience": null
},
"created_at": "2023-08-07T21:09:34.992679",
"updated_at": "2023-08-07T21:09:34.992679"
},
{
"name": "Christina Walsh",
"address": "84143 Christopher Station\nJamesland, FM 35789",
"email": "dianafisher@example.net",
"phone_number": "(528)912-8622x763",
"job": "Government social research officer",
"designation": "Medical technical officer",
"date_of_joining": "2021-10-31",
"date_of_birth": "1984-05-02",
"programming_skill": {
"language": "Python",
"proficiency_level": "Intermediate",
"experience": null
},
"created_at": "2023-08-07T21:09:34.993251",
"updated_at": "2023-08-07T21:09:34.993251"
}

Part-5 : Infer-Schema with New Line JSON File

-- lets run it for emp_json_with_newline.json
select * from table(infer_schema(
location => '@json_stg/json/employee' ,
files => 'emp_json_with_newline.json',
file_format=>'json_file_format'
));
{
"name": "Steven Smith",
"address": "4116 Jeffery Stravenue\nSmithview, MN 28257",
"email": "annette03@example.net",
"phone_number": "(825)653-0970x0667",
"job": "Automotive engineer",
"designation": "Lobbyist",
"date_of_joining": "2020-06-08",
"date_of_birth": "2016-12-19",
"programming_skill": {
"language": "JavaScript",
"proficiency_level": "Expert",
"experience": null
},
"created_at": "2023-08-07T21:09:34.992679",
"updated_at": "2023-08-07T21:09:34.992679"
}
{
"name": "Christina Walsh",
"address": "84143 Christopher Station\nJamesland, FM 35789",
"email": "dianafisher@example.net",
"phone_number": "(528)912-8622x763",
"job": "Government social research officer",
"designation": "Medical technical officer",
"date_of_joining": "2021-10-31",
"date_of_birth": "1984-05-02",
"programming_skill": {
"language": "Python",
"proficiency_level": "Intermediate",
"experience": null
},
"created_at": "2023-08-07T21:09:34.993251",
"updated_at": "2023-08-07T21:09:34.993251"
}

Create Table From JSON File Using Infer-Schema

-- lets create a table using template keyword
create or replace transient table emp_01
using template (
select array_agg(object_construct(*)) from table(
infer_schema(
location => '@json_stg/json/employee' ,
files => 'simple_emp_single_entity.json',
file_format=>'json_file_format'
)
)
);

desc table emp_01;

copy into emp_01 from (
select
$1:active_status::BOOLEAN,
$1:address::TEXT,
$1:age::NUMBER(2, 0),
$1:created_at::TIMESTAMP_NTZ,
$1:date_of_birth::DATE,
$1:date_of_joining::DATE,
$1:designation::TEXT,
$1:email::TEXT,
$1:height_in_ft::NUMBER(3, 2),
$1:job::TEXT,
$1:name::TEXT,
$1:phone_number::ARRAY,
$1:programming_skill::OBJECT,
$1:updated_at::TIMESTAMP_NTZ
from @json_stg/json/employee/simple_emp_single_entity.json
(file_format => json_file_format) t)
on_error = 'Continue' ;

Automation Using Snowpark Python API

Click on this link (under-development) to access the Snowpark Python API automation code

--

--

Data Engineering Simplified

Passionate About Data Engineering, Snowflake Cloud Data Warehouse & Cloud Native Data Services