Answer the questions (below), and keep track of your answers somewhere (a notepad?). You will input your answers into Canvas. Figure out the answers BEFORE opening Canvas, otherwise you’ll run out of time.
Task 1: Elephant in the room
Look at the political cartoon below, from 1996, depicting Colombian presidential candidate Ernesto Samper. Do some digging around online, and in 3-5 sentences (TOTAL) discuss the meaning and intention of each cartoon. Some guiding questions:
- What is each cartoon referencing? What point is the author trying to emphasize in each cartoon?
- Why an elephant? Why is it behind Samper?
- What’s the connection between the piggy bank and what Samper is saying in the second cartoon?
Cartoon by Vladimir Florez (“Vladdo”), 1996
Describe first cartoon.
Explain second cartoon.
Task 2: Catching corruption
Here we will use a dataset to learn about detecting fraud/corruption in data. This dataset is payment data of a division of a West Coast utility company. The variable of interest is Amount, which is the amount the company paid for different services.
- Download this: Benford’s Law data
State corruption plays a big role in the success of criminal organizations. Some of this corruption involves data and numbers (e.g., payments for services, money laundering, etc.). What tools do regulators have to detect corruption and fraud in data? One simple (but kinda clever) tool relies on Benford’s Law.
- Google what Benford’s Law is and it’s role in the detection of fraud. In your own words, briefly describe how someone could use Benford’s Law to detect irregulariities in data.
Let’s use Benford’s Law on the dataset above. Examples of Benford’s Law usually rely on the distribution of the first digit in a number, but you can also use the first two digits. With the dataset:
Create a column called
first_digitsthat shows the first two digits in the
Amountcolumn. You will need to use the function
LEFT()(look it up!).
Next, using a pivot table, COUNT the number of times each pair of first digits appears in the dataset. This is identical to what you did before where you tallied ethnic identity categories.
How many times was there an amount that began with the digits “35”?
Finally, make a barplot with the digits on the x-axis and the number of times each digit appears on the y-axis. It should mostly (but not quite) look like the Benford’s Law distribution. Save this – you will submit it.
Eyeball the plot: there is a point near the middle of the distribution that clearly violates Benford’s Law – it’s a pair of digits that is present an abnormal amount of times in the data. What pair of digits is it? Write them as a number (e.g., 23).