Challenge #15: Finding differences in datasets


You are a data center administrator. You have 100 machines carefully configured and running smoothly.

Each machine has a mac_address, an IP, a network name and an operating system installed. You keep track of all these things like so:


The Challenge

One night, a horde of drunken cats invade the data center.

They randomly turn machines off, turn decommissioned machines on, or mess with the configuration and OS of the machines.

The only thing they could not tamper with were the mac addresses.

The next day you take stock of the state of the data center to check what you need to fix to get back to normal.

For this challenge, you need to compare the original state of the data center with the sabotaged state. Begin with start.dfl (306.1 KB)

Screenshot 2020-09-14 at 14.14.14

Your data flow should determine which mac_addresses need administrative action.

  • machines that were turned off need to be booted up
  • any extra machines that were turned on need to be shut down
  • if the cats changed the operating system, specify which system should be installed
  • if the cats changed the ip specify which ip to use
  • if the cats changed the name, specify which name to use

Your flow should produce at a data set like this:

Hint: The Diff on sorted keys step is helpful in determining any changes between data sets.

The solution.dfl (329.3 KB) for this challenge uses the diff on sorted keys step using the mac_address field as key, and tracking changes in the other fields.

The diff step tells us if a record is identical, new, deleted, or changed. We can dismiss the identical ones, and focus on the ones that need action.

In the calculator step, we determine the actions necessary for any individual record.

  • In case the record was deleted, we retrieve the mac address from the reference record and set action to “boot up”
  • In case the record is new, it just needs to be “shut down”
  • In case the record was changed, we need to get it back to how it was before. One or more fields have changed, so one or more actions might be necessary. We iterate over the changes reported by the diff step, and generate a string containing the recommended action to fix the changed field. We can join that list of actions into a string, and we arrive at the required output.