A Debugging Story: A Bug in the S3 API
In the blog post on how to import an AWS S3 bucket I
mentioned that sometimes importing a resource will fail with the following error
Error: Cannot import non-existent remote object despite the resource existing.
This blog post will detail how I debugged that issue. While I don't have a
solution, I was at least able to determine that the error was coming from the
AWS API and not a bug in Terraform.
My plan when creating the blog post was that I would create a test bucket,
modify it, try to
import it, delete it, and repeat this many times trying to
terraform import different settings. Because I planned to do this many times,
I wasn't paying too much attention to how I was configuring the bucket, I just
wanted to try a bunch of permutations and see what happened.
After some time I got the error message:
Error: Cannot import non-existent remote object. Because I had just been playing around I didn't know if this
was the first time I'd tried to
import that setting or how I had set it. I
wasn't sure if maybe this was a mistake on my part or somewhere else.
At this point I decided I probably needed to dig deeper.
Up to now, I'd just been fooling around and trying things, so I had not been
very systematic in my approach or taking notes. I had an
aws_s3_bucket_ownership_controls resource. I didn't know how I had modified
the bucket but I knew I couldn't
import that resource.
Sometimes when you do something on AWS, it takes a little while for the change
to propagate. My first guess was that this was the case. I waited a few
minutes and tried to
import again. but the issue didn't go away, so it wasn't
With my first guess not panning out, I decided to search Google for the error
message. I found a GitHub
issue (opens in a new tab) that
was not resolved where a user had experienced the same error. Because of the
GitHub issue, I thought that this was a bug in the resource. For some reason,
aws_s3_bucket_ownership_controls resource was not importable. I decided
to move on from the bug and make a note in the blog post saying that this
resource could not be imported.
I deleted the bucket and created it again and modified it and went back to
importing attributes. After a few more rounds, I hit the error again, but this
time with a different resource:
aws_s3_bucket_public_access_block. I knew
that I had successfully imported that before.
I then ran the
import for the
aws_s3_bucket_ownership_controls resource again.
To my surprise, it worked.
At this moment I decided that I needed to be able to consistently recreate the
error I was seeing. I went back to deleting the bucket and creating it again
and modifying it. I finally hit upon the cause and was able to recreate the
issue: when creating the bucket, if I modified a configuration on create, I
import the attribute. If I modified it after create, I could
import it. Additionally, if I created a bucket with the default
configuration, I could
There is something weird going on where modifying the setting on create,
Terraform can not
I then tested that if I could not
import an attribute, if I modified it in the
AWS console and then reverted the modification, I could
I thought that this must be a bug in the AWS provider. The AWS S3 resources had
recently been significantly refactored, so probably a bug had slipped through.
But how to prove it? Luckily, Terraform has very detailed debug logging,
especially at the
trace level. I performed an
import with the logging level
env TF_LOG=trace terraform import 'aws_s3_bucket_public_access_block.bucket' terrateam-test-bucket
I saw the failed call in the output:
----------------------------------------------------- [DEBUG] [aws-sdk-go] DEBUG: Response s3/GetPublicAccessBlock Details: ---[ RESPONSE ]-------------------------------------- HTTP/1.1 404 Not Found Transfer-Encoding: chunked Content-Type: application/xml Date: Fri, 06 Jan 2023 21:50:30 GMT Server: AmazonS3 X-Amz-Id-2: ... X-Amz-Request-Id: ... -----------------------------------------------------
Looking up the public access block with the AWS API was returning 404.
To verify this, I switched over to the
aws CLI tool to perform the query
$ aws s3api get-public-access-block --bucket terrateam-test-bucket An error occurred (NoSuchPublicAccessBlockConfiguration) when calling the GetPublicAccessBlock operation: The public access block configuration was not found
I've verified that the underlying API is returning 404. But is that expected? Is there still a bug where the provider is supposed to be doing another API call in this situation?
To prove this, I recreated my bucket such that it could be imported and executed the API call. I got an actual response back and not 404.
I did another experiment where I created a bucket where I could not
attributes, verified that the response was 404, then modified configuration and
verified that the API gave a response.
- I've managed to determine how to consistently create bucket attributes that can be successfully imported or cannot be imported.
- I was able to use debug logging in Terraform to determine that the attribute cannot be imported because the API was returning 404.
- I was able to confirm this using the
awsCLI to perform the API call directly, getting the same result as Terraform.
- I was able to show that attributes that can be imported do not return 404 from the API.
- Finally, I showed that if I could not import a resource, if I then modified it in the AWS console (for example, toggling the "public access block" setting), I could then import it.
Conclusion? The AWS S3 API has a bug in it where if the configuration is modified on create, the API responds with 404 for those attributes.
Debugging is more art than science. It requires a lot of creativity. You have this system that is not working as expected and you have to come up with ways to poke it such that it will reveal more information to you. That is very dependent on the situation and it is hard to give any generic advice. But there are some things you can do to improve your odds.
- Take notes. Once I decided to start debugging this, I started taking notes. Everything I did, I recorded what it was and its output. I wanted to try a lot of different things and it was necessary that I didn't get confused about what I did or if I had tried something earlier.
- Try different things. I got to the issue being bucket creation vs bucket modification just by having a list of things to try. I had no intuition that it would be that, I was just trying everything.
- Make a hypothesis and prove (or disprove) it. I could have just stopped with "the Terraform AWS provider has a bug". But just because that is how I experienced the bug doesn't mean that is the cause. There are a lot of layers of software between Terraform and AWS, and it could be any of those. Luckily, Terraform has extensive debug logging. Without that, hopefully eventually I would have stumbled upon doing the AWS API calls myself.
If you're interested in debugging, the Oxide Computer Company has some great podcast episodes on debugging: