Scanning CSV in Go
For the purpose of this article, consider the following CSV data, slightly modified from the docs for encoding/csv
:
csvData := strings.NewReader(strings.Join([]string{
`first_name,last_name,username`,
`"Rob","Pike",rob`,
`Ken,Thompson,ken`,
`"Robert","Griesemer","gri"`,
}, "\n"))
Here's how you read the data, line by line, using the Reader
provided in that package:
reader := csv.NewReader(csvData)
for { record, err := reader.Read() if err == io.EOF { break } if err != nil { // handle the error... // break? continue? neither? }
fmt.Println(record)
}
// Output: // [first_name last_name username] // [Rob Pike rob] // [Ken Thompson ken] // [Robert Griesemer gri]
There are a few awkward elements to this approach:
- We are checking for
io.EOF
each time around the loop. - We are checking for a non-nil error each time around the loop.
- It's not clear what kind of non-nil errors might appear and what kind of handling logic the programmer should use in each case.
Generally, I expect CSV files to be well-formed and I break out of the read loop at the first sign of trouble. If that's also the approach you generally use, well, we've got an even more elegant way to read CSV data!
https://pkg.go.dev/github.com/smartystreets/scanners/csv
scanner := csv.NewScanner(csvData)
for scanner.Scan() { fmt.Println(scanner.Record()) }
if err := scanner.Error(); err != nil { log.Panic(err) }
// Output: // [first_name last_name username] // [Rob Pike rob] // [Ken Thompson ken] // [Robert Griesemer gri]
This will look very familiar if you've ever used io/bufio.Scanner
. No more cumbersome checks for io.EOF
or errors in the body of the loop! By default, scanner.Scan()
returns false
at the first sign of an error from the underlying encoding/csv.Reader
. So, how do you customize the behavior of the scanner you ask? What if the CSV data makes use of another character for the separater/delimiter/comma? Observe the use of variadic, functional configuration options accepted by csv.NewScanner
:
csvDataCustom := strings.Join([]string{
first_name;last_name;username
, // ';' is the delimiter!"Rob";"Pike";rob
,# lines beginning with a # character are ignored
, // '#' is the comment character!Ken;Thompson;ken
,"Robert";"Griesemer";"gri"
, }, "\n")scanner := csv.NewScanner(csvDataCustom, csv.Comma(';'), csv.Comment('#'), csv.ContinueOnError(true))
for scanner.Scan() { if err := scanner.Error(); err != nil { log.Panic(err) } else { fmt.Println(scanner.Record()) } }
// Output: // [first_name last_name username] // [Rob Pike rob] // [Ken Thompson ken] // [Robert Griesemer gri]
Pretty flexible, right? And notice, we still don't have to detect io.EOF
, that happens internally and results in scanner.Scan()
returning false
.
Now, what if you are scanning the rows into struct
values that have fields that mirror the CSV schema? Suppose we have a Contact
type that mirrors our CSV schema...what's a nice way to encapsulate the translation from a CSV record to a Contact
? Embed a *csv.Scanner
in a ContactScanner
and override the Record
method to return an instance of the Contact
struct rather than the []string
record!
package main
import ( "fmt" "io" "log" "strings"
"github.com/smartystreets/scanners/csv"
)
type Contact struct { FirstName string LastName string Username string }
type ContactScanner struct{ *csv.Scanner }
func NewContactScanner(reader io.Reader) *ContactScanner { inner := csv.NewScanner(reader) inner.Scan() // skip the header! return &ContactScanner{Scanner: inner} }
func (this *ContactScanner) Record() Contact { fields := this.Scanner.Record() return Contact{ FirstName: fields[0], LastName: fields[1], Username: fields[2], } }
func main() { csvData := strings.NewReader(strings.Join([]string{
first_name,last_name,username
,"Rob","Pike",rob
,Ken,Thompson,ken
,"Robert","Griesemer","gri"
, }, "\n"))scanner := NewContactScanner(csvData) for scanner.Scan() { fmt.Printf("%#v\n", scanner.Record()) } if err := scanner.Error(); err != nil { log.Panic(err) } // Output: // main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"} // main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"} // main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}
}
But we can go even further if you're not averse to using struct tags and reflection. Notice below that the StructScanner
is able to populate a pointer to a struct whose fields are decorated with CSV struct tags corresponding with the header column names:
package main
import ( "fmt" "log" "strings"
"github.com/smartystreets/scanners/csv"
)
type Contact struct { FirstName string
csv:"first_name"
LastName stringcsv:"last_name"
Username stringcsv:"username"
}func main() { csvData := strings.NewReader(strings.Join([]string{
first_name,last_name,username
,"Rob","Pike",rob
,Ken,Thompson,ken
,"Robert","Griesemer","gri"
, }, "\n"))scanner, err := csv.NewStructScanner(csvData) if err != nil { log.Panic(err) } for scanner.Scan() { var contact Contact if err := scanner.Populate(&contact); err != nil { log.Panic(err) } fmt.Printf("%#v\n", contact) } if err := scanner.Error(); err != nil { log.Panic(err) } // Output: // main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"} // main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"} // main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}
}
Clearly, there are many ways to read a CSV file (including other nicely written packages). Happy (CSV) scanning!
go get -u github.com/smartystreets/scanners/csv