New 42 day free trial
Smarty

Scanning CSV in Go

Smarty header pin graphic
Michael Whatcott
Michael Whatcott
 • 
May 5, 2018
Tags

For the purpose of this article, consider the following CSV data, slightly modified from the docs for encoding/csv:

csvData := strings.NewReader(strings.Join([]string{
	`first_name,last_name,username`,
	`"Rob","Pike",rob`,
	`Ken,Thompson,ken`,
	`"Robert","Griesemer","gri"`,
}, "\n"))

Here's how you read the data, line by line, using the Reader provided in that package:

reader := csv.NewReader(csvData)

for { record, err := reader.Read() if err == io.EOF { break } if err != nil { // handle the error... // break? continue? neither? }

fmt.Println(record)

}

// Output: // [first_name last_name username] // [Rob Pike rob] // [Ken Thompson ken] // [Robert Griesemer gri]

There are a few awkward elements to this approach:

  1. We are checking for io.EOF each time around the loop.
  2. We are checking for a non-nil error each time around the loop.
  3. It's not clear what kind of non-nil errors might appear and what kind of handling logic the programmer should use in each case.

Generally, I expect CSV files to be well-formed and I break out of the read loop at the first sign of trouble. If that's also the approach you generally use, well, we've got an even more elegant way to read CSV data!

https://pkg.go.dev/github.com/smartystreets/scanners/csv

scanner := csv.NewScanner(csvData)

for scanner.Scan() { fmt.Println(scanner.Record()) }

if err := scanner.Error(); err != nil { log.Panic(err) }

// Output: // [first_name last_name username] // [Rob Pike rob] // [Ken Thompson ken] // [Robert Griesemer gri]

This will look very familiar if you've ever used io/bufio.Scanner. No more cumbersome checks for io.EOF or errors in the body of the loop! By default, scanner.Scan() returns false at the first sign of an error from the underlying encoding/csv.Reader. So, how do you customize the behavior of the scanner you ask? What if the CSV data makes use of another character for the separater/delimiter/comma? Observe the use of variadic, functional configuration options accepted by csv.NewScanner:

csvDataCustom := strings.Join([]string{
first_name;last_name;username, // ';' is the delimiter!
"Rob";"Pike";rob,
# lines beginning with a # character are ignored, // '#' is the comment character!
Ken;Thompson;ken,
"Robert";"Griesemer";"gri",
}, "\n")

scanner := csv.NewScanner(csvDataCustom, csv.Comma(';'), csv.Comment('#'), csv.ContinueOnError(true))

for scanner.Scan() { if err := scanner.Error(); err != nil { log.Panic(err) } else { fmt.Println(scanner.Record()) } }

// Output: // [first_name last_name username] // [Rob Pike rob] // [Ken Thompson ken] // [Robert Griesemer gri]

Pretty flexible, right? And notice, we still don't have to detect io.EOF, that happens internally and results in scanner.Scan() returning false.

Now, what if you are scanning the rows into struct values that have fields that mirror the CSV schema? Suppose we have a Contact type that mirrors our CSV schema...what's a nice way to encapsulate the translation from a CSV record to a Contact? Embed a *csv.Scanner in a ContactScanner and override the Record method to return an instance of the Contact struct rather than the []string record!

package main

import ( "fmt" "io" "log" "strings"

"github.com/smartystreets/scanners/csv"

)

type Contact struct { FirstName string LastName string Username string }

type ContactScanner struct{ *csv.Scanner }

func NewContactScanner(reader io.Reader) *ContactScanner { inner := csv.NewScanner(reader) inner.Scan() // skip the header! return &ContactScanner{Scanner: inner} }

func (this *ContactScanner) Record() Contact { fields := this.Scanner.Record() return Contact{ FirstName: fields[0], LastName: fields[1], Username: fields[2], } }

func main() { csvData := strings.NewReader(strings.Join([]string{ first_name,last_name,username, "Rob","Pike",rob, Ken,Thompson,ken, "Robert","Griesemer","gri", }, "\n"))

scanner := NewContactScanner(csvData)

for scanner.Scan() {
	fmt.Printf("%#v\n", scanner.Record())
}

if err := scanner.Error(); err != nil {
	log.Panic(err)
}

// Output:
// main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"}
// main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"}
// main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}

}

But we can go even further if you're not averse to using struct tags and reflection. Notice below that the StructScanner is able to populate a pointer to a struct whose fields are decorated with CSV struct tags corresponding with the header column names:

package main

import ( "fmt" "log" "strings"

"github.com/smartystreets/scanners/csv"

)

type Contact struct { FirstName string csv:"first_name" LastName string csv:"last_name" Username string csv:"username" }

func main() { csvData := strings.NewReader(strings.Join([]string{ first_name,last_name,username, "Rob","Pike",rob, Ken,Thompson,ken, "Robert","Griesemer","gri", }, "\n"))

scanner, err := csv.NewStructScanner(csvData)
if err != nil {
	log.Panic(err)
}

for scanner.Scan() {
	var contact Contact
	if err := scanner.Populate(&contact); err != nil {
		log.Panic(err)
	}
	fmt.Printf("%#v\n", contact)
}

if err := scanner.Error(); err != nil {
	log.Panic(err)
}

// Output:
// main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"}
// main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"}
// main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}

}

Clearly, there are many ways to read a CSV file (including other nicely written packages). Happy (CSV) scanning!

go get -u github.com/smartystreets/scanners/csv

Source Code

Subscribe to our blog!
Learn more about RSS feeds here.
rss feed icon
Subscribe Now
Read our recent posts
Inside Smarty® - Irina O'hara
Arrow Icon
Irina O'Hara is one of our uniquely clever, expert frontend developers. She’s immensely talented and has had a vital impact on our website redesign. When it came time to spotlight her, Irina was a joy to sit down with and get to know a little better. To get to the basics, she writes code and creates awesome websites, and she’s darn good at both. BackgroundIrina was born and raised in St. Petersburg, Russia. However, she wasn't born a development expert and had other aspirations from the start.
How I reduced my returned mail from 27% to 1% using address autocomplete
Arrow Icon
The following is based on a true story. Some of the names and relationships have been changed to protect the anonymity of individuals and companies. However, the numbers are 100% accurate. In 2023, I wanted to mail some really fancy cards to 165 businesses. I collected their addresses by asking for them or finding them in their online listing and collected them all in a neat little row. Then, I went a step further and ran these addresses through Smarty's bulk address validation tool. Everything was set and perfect.
The ROI of accurate healthcare address validation: Stop hemorrhaging red on your financial statements
Arrow Icon
In healthcare, the havoc an inaccurate address can wreak on your financial results is significant in more ways than one, and the boost in overall profitability from maintaining a clean address database is equally worth noting. Accurate healthcare address validation improves operational efficiency, patient engagement, and compliance and builds revenue to heights that couldn’t be met without it. Here’s what we’ll be covering:Healthcare address validation pros and consCon: Increased claim denials and organizational costsPro: Reduced claim denials and reprocessing costsCon: Increasing patient match error ratesPro: Improved patient matching and data qualityCon: Complicated billing and collections processesPro: Streamlined billing and collections capabilitiesCon: Exposure to legal liabilitiesPro: Enhanced regulatory compliance and risk aversionCon: Misplaced market strategyPro: Data-driven decision-making and market insightsEpilogue: Avoiding the pain (see our summarized financial savings)Healthcare address validation pros and consThere’s a pro and a con associated with having (or not having 🫣) accurate address data in your healthcare systems.

Ready to get started?