Challenge #8: Normalizing data into key-value shape

For this challenge we need to over-normalize data in order to import it in our target system:

We start with data shaped like this:

id first_name last_name gender age cell
1012 Jean Reid female 51 Australia
1040 Nikolai Haugsvær male 42 Norway
1056 Florence Larson female 66 Ireland
1007 Romana Köppl female 66 Germany
1016 Fabiano Waterschoot male 66 Netherlands
1027 Kenan Akay male 39 Turkey
1048 Diego Philippe male 51 France
1001 Mar da Paz male 44 Brazil
1033 Benjamin Riviere male 66 France
1031 Dominic Li male 62 Canada

And we need to over-normalize to id,field_name,field_value form:

id field_name field_value
1007 first_name Romana
1016 country Netherlands
1016 last_name Waterschoot
1027 cell (152)-923-5489
1048 last_name Philippe
1056 last_name Larson
1048 cell 06-16-42-26-36
1001 last_name da Paz
1033 cell 06-87-13-00-25
1027 country Turkey
1040 first_name Nikolai
1031 first_name Dominic
1033 last_name Riviere
1007 cell 0175-5800351

Use start.dfl (32.3 KB) to get started

The idea for the solution is to think of our input rows as lists of key-value pairs. Each field in an input row has a name and a value.

If we could iterate over these entries, and produce one output row for each, we got our solution.

The Get Items step does exactly what we need. It expects a list, and iterates over this list producing an output row for each list item.

We want to iterate over all fields, except id, and put the name and value of the field into the row stream. Here’s the configuration for that:

This solution produces a list of keys for the step to iterate over. First we call data.omit on our input row. This gives us a row record without the id field. On that we call data.keys, which gives us the list of keys we want to iterate over.

On the output fields, we use the key as the field_name and use it as an index to retrieve the value from the current row.

After using the Pick Fields step to just select id, field_name, and field_value, we have the desired result

See solution.dfl (37.2 KB) for the implementation.