Challenge #13: Extract Data from JSON

For this challenge, we’re given some JSON that contains a list of objects of differing structures. We’re tasked with extracting the id and name of each object for further processing.

Data Source

[
  {"id": 54, "name": "Professor Moriarty"},
  {"id": 42, "first_name": "John", "last_name": "Watson"},
  {"error": "invalid record"},
  {
    "id": 221,
    "name": "Sherlock Holmes",
    "photo": {
      "id":412395287,
      "thumbnails": [{"id":46129037}]
    }
  },
  {"id": 27, "first_name": "Irene", "last_name": "Adler"}
]

Processing

  • In case an item has an id key, we extract it into a field. Missing ids are represented as -1.

  • In case a name key is present, we use that directly. If there are first_name and last_name keys instead, we concatenate the contents into a name field. If no name is available, we use the value "N/A" to represent that fact.

Expected Output

id name
54 Professor Moriarty
42 John Watson
-1 N/A
221 Sherlock Holmes
27 Irene Adler

Data File

Get started with extract-from-json.dfl (4.4 KB) - it contains the source data.

Here’s a solution.dfl (13.9 KB) to this challenge.

Screenshot 2020-07-13 at 11.30.44

It starts with the JSON string, converts the data into the record types we need, and places each record into the row stream.

The core lies in the calculator step: “Get names and ids”. Let’s have a look at it.

The output field contains the result of mapping each entry of the JSON array through a pick_fields function.

The pick_fields function takes a record as present in the JSON data, and returns another, where the id and name have been extracted as per our rules.

The id is extracted using a pick_id helper function which is just the data.get function with the key and not_found parameters fixed to "id" and -1 respectively.

The name is extracted using a pick_name helper function, which examines the record for the presence of the "name" key, or alternatively both the "first_name" and "last_name" keys, and returns a result as appropriate.

The step places the result of our calculation into the records field.

Now that we have a list of correct records in a field, we can use the Get Items step to get them into the row stream individually, and use the Pick Fields step to limit our fields to id and name only.

We’ve now arrived at the desired row structure: