This is my first Go program. I’m learning the language but it’s a bit difficult to understand all the concepts so in order to practice I wrote this. It’s a simple program which recursively check for duplicated files in a directory.
It uses a SHA256 hash on files in order to identify if two files are the same or not. I spawn multiple workers to handle this hashing.
Here is how it works:
- 5 workers (goroutine) are spawned, each of them waiting for file paths to process on the same channel, named
input
in my code. - 1 goroutine is spawned to recursively search for files in the direvtory, and populate the
input
channel with file names. - The main goroutine process the results as soon as they are available and add them to a map of sha256->[file, file, …].
Finally we just display the duplicates.
Please feel to comment on anything, I really want to progress in Go, and especially “idiomatic” Go.
package main import ( "crypto/sha256" "encoding/hex" "fmt" "io/ioutil" "os" "path/filepath" "sync" ) const DIR = "/home/thibaut/SynologyDrive" const WORKERS = 5 type Result struct { file string sha256 [32]byte } func worker(input chan string, results chan<- *Result, wg *sync.WaitGroup) { for file := range input { data, err := ioutil.ReadFile(file) if err != nil { panic(err) } sha := sha256.Sum256(data) results <- &Result{ file: file, sha256: sha, } } wg.Done() } func search(input chan string) { filepath.Walk(DIR, func(path string, info os.FileInfo, err error) error { if err != nil { panic("Error while reading files") } if !info.IsDir() { input <- path } return nil }) close(input) } func main() { // Results objects will be allocated from a pool to reduce garbage // collector pressure when parsing a lot of files. input := make(chan string) results := make(chan *Result) wg := sync.WaitGroup{} wg.Add(WORKERS) for i := 0; i < WORKERS; i++ { go worker(input, results, &wg) } go search(input) go func() { wg.Wait() close(results) }() counter := make(map[[32]byte][]string) for result := range results { counter[result.sha256] = append(counter[result.sha256], result.file) } for sha, files := range counter { if len(files) > 1 { fmt.Printf("Found %d duplicates for %s: \n", len(files), hex.EncodeToString(sha[:])) for _, f := range files { fmt.Println("-> ", f) } } } }