diff options
-rw-r--r-- | explanation.md | 138 | ||||
-rw-r--r-- | tmp.go | 115 | ||||
-rwxr-xr-x | tmp.py | 114 |
3 files changed, 367 insertions, 0 deletions
diff --git a/explanation.md b/explanation.md new file mode 100644 index 0000000..cc5f0da --- /dev/null +++ b/explanation.md @@ -0,0 +1,138 @@ +### Overall Explanation of the Code +The provided Go code is a command-line utility designed to check the +size of files passed as arguments and determine whether they should be +compressed based on their size and file extension. The program utilizes +the `gzip` command to compress files larger than 1 MB that do not have +certain compressed file extensions. It also contains a configuration map +(`URL_VARIANTS`) that allows the user to define different base URLs for +various sections, although this part of the code is not actively used in +the main logic. + +### Code Structure Overview +- **Imports**: The code imports necessary packages for logging, + executing commands, handling file paths, and manipulating strings. +- **Constants and Variables**: + - `COMPRESSED_EXTENSIONS`: A slice of strings representing file + extensions that are already considered compressed. + - `URL_VARIANTS`: A map that holds different configurations for + various sections, including base URLs and target URLs. +- **Main Function**: + - Retrieves the command name and checks if it corresponds to any entry + in `URL_VARIANTS`. + - Validates that at least one file argument is provided. + - Iterates over each file argument, checks its size and extension, and + decides whether to compress it. +- **Helper Function**: + - `contains`: Checks if a given item exists within a slice of strings. + +### Possible Bugs +1. **Missing URL Variants Handling**: The code retrieves the + configuration for the command name but does not handle cases where the + command name does not exist in `URL_VARIANTS`, which could lead to + a `nil` dereference. +2. **Error Handling**: The program exits with `os.Exit(1)` if no + arguments are provided, but it does not log a message indicating why it + exited, which could confuse users. +3. **Compression Command**: The code assumes that the `gzip` command is + available in the system's PATH. If it is not, the program will fail + without a clear error message. + +### Possible Improvements +1. **Error Messages**: Add more descriptive error messages when exiting + due to missing command variants or file arguments. +2. **Configuration Validation**: Implement checks to ensure that the + `URL_VARIANTS` map contains the necessary keys before accessing them. +3. **File Compression Feedback**: Provide feedback on successful + compression, such as logging the new file name created. +4. **Command-Line Flags**: Consider using a command-line flag library to + improve argument parsing and provide help messages. + +### External Dependencies +- The code relies on the `gzip` command-line utility for file + compression. It also uses standard Go libraries (`log`, `os`, + `os/exec`, `path/filepath`, `strconv`, and `strings`) which are part + of the Go standard library and do not require external installation. + +### Potential Security Concerns +1. **Command Injection**: The use of `exec.Command` with user-provided + input could lead to command injection vulnerabilities if not properly + sanitized. Although the current implementation does not directly take + user input for command execution, it is a consideration for future + modifications. +2. **File Permissions**: The program does not check for file permissions + before attempting to read or compress files, which could lead to + permission errors if the user does not have the necessary rights. +3. **Resource Exhaustion**: If a large number of files are processed, + the program could exhaust system resources (e.g., memory or file + descriptors) if not properly managed, especially if the compression + command spawns many subprocesses. + +In summary, while the code serves its purpose of checking file sizes and +compressing them as needed, it could benefit from improved error +handling, validation, and security measures to enhance its robustness +and usability. + +### Error Handling Analysis +1. **Use of `log.Fatal`**: The code employs `log.Fatal(err)` to handle + errors, which logs the error message and exits the program. While this + is effective for critical failures, it does not allow for graceful + recovery or error reporting to the user. +2. **Limited Error Context**: The error messages provided do not give + detailed context about where the error occurred, which can complicate + debugging. +3. **Exit on Missing Arguments**: The program exits with `os.Exit(1)` if + no arguments are provided, which is a straightforward approach but could + be improved by providing a user-friendly message indicating the expected + usage. + +### Concurrency and Threading +1. **Single-threaded Execution**: The code processes files sequentially + without any concurrency or parallelism. For large numbers of files, this + could lead to performance bottlenecks. +2. **Potential for Blocking**: The use of `exec.Command` to run external + commands (like `gzip`) can block the main thread until the command + completes, further impacting performance. + +### Refactoring Suggestions +1. **Error Handling Improvement**: Instead of using `log.Fatal`, + consider returning errors to the caller or logging them with more + context. This would allow for better error management and user feedback. +2. **Command Execution Handling**: Wrap the command execution in + a function that can handle retries or provide more detailed error + messages. +3. **Concurrency Implementation**: Introduce goroutines to process files + concurrently, which would significantly improve performance when dealing + with multiple files. +4. **Configuration Management**: Consider externalizing the + `URL_VARIANTS` configuration to a separate file or environment variables + for easier management and updates. + +### Comparisons with Best Practices +1. **Logging Practices**: While logging is used, it could be enhanced by + including timestamps and log levels (info, warning, error) for better + traceability. +2. **Function Modularity**: The `main` function is doing multiple tasks + (argument parsing, file processing, logging), which could be broken down + into smaller, more focused functions to improve readability and + maintainability. +3. **Use of Constants**: The use of constants for file extensions is + good practice, but consider using a more descriptive name for + `COMPRESSED_EXTENSIONS` to clarify its purpose. + +### Collaboration and Readability +1. **Code Comments**: The code includes some comments, but they could be + expanded to explain the purpose of functions and complex logic, making + it easier for collaborators to understand. +2. **Descriptive Variable Names**: Variable names like `cmdname` and + `processedFile` are reasonably descriptive, but further clarity could be + added by using more context-specific names. +3. **Error Messages**: Providing more informative error messages would + enhance collaboration, as other developers would have a clearer + understanding of issues that arise. +4. **Documentation**: Including a README or documentation that explains + how to use the program, its dependencies, and its configuration would + greatly benefit other developers and users. + +Overall, while the code is functional, there are several areas for +improvement in error handling, concurrency, and readability that could +enhance its robustness and maintainability. @@ -0,0 +1,115 @@ +package main + +import ( + "fmt" + "log" + "net/url" + "os" + "os/exec" + "path/filepath" + "strings" +) + +var COMPRESSED_EXTENSIONS = []string{".jpg", ".webm", ".ogg", ".gz", ".bz2", ".zip", + ".xz", ".odp", ".odt", ".ods", ".mp4", ".mp3", + ".png", ".pdf", ".apk", ".rpm", ".epub", ".mkv", + ".avi", ".jar", ".opus", ".ogv", ".flac", ".msi", + ".m4a"} + +var URL_VARIANTS = map[string]map[string]string{ + "wotan": { + "base_url": "wotan:Export/", + "target_url": "https://w3.suse.de/~mcepl/", + }, + "tmp": { + "base_url": "fedorapeople.org:public_html/tmp/", + "target_url": "https://mcepl.fedorapeople.org/tmp/", + // "shorten_api": "http://is.gd/create.php?format=simple&url=" + "shorten_api": "https://da.gd/s?url=", + }, +} + +func contains(slice []string, item string) bool { + for _, a := range slice { + if a == item { + return true + } + } + return false +} + +func main() { + if len(os.Args) < 2 { + os.Exit(1) + } + + cmdname := strings.ToLower(filepath.Base(os.Args[0])) + config := URL_VARIANTS[cmdname] + + if _, err := os.Stat(os.Args[1]); os.IsNotExist(err) { + os.Exit(1) + } + + for _, processedFile := range os.Args[1:] { + st, err := os.Stat(processedFile) + if err != nil { + log.Fatal(err) + } + fileExt := filepath.Ext(processedFile) + log.Printf("size of %s is %d.", processedFile, st.Size()) + log.Printf("extension is %s", fileExt) + + var fname string + var COMPRESSED bool + + if st.Size() > 1000000 && !contains(COMPRESSED_EXTENSIONS, strings.ToLower(fileExt)) { + log.Printf("file %s should be compressed (%d b)", processedFile, st.Size()) + err := exec.Command("gzip", "--keep", "--best", processedFile).Run() + if err != nil { + log.Fatal("Compressing the file failed!") + } + fname = processedFile + ".gz" + COMPRESSED = true + } else { + log.Printf("file %s should not be compressed (%d b)", processedFile, st.Size()) + fname = processedFile + COMPRESSED = false + } + + err = os.Chmod(fname, 0o644) + if err != nil { + log.Fatal(err) + } + modifiedFname := strings.Replace(fname, ":", "_", 1) + err = os.Rename(fname, modifiedFname) + if err != nil { + log.Fatal(err) + } + log.Printf("Unshorten fname = %s", modifiedFname) + + _, err = exec.Command("scp", modifiedFname, config["base_url"]).Output() + if err != nil { + log.Fatal(err) + } + + targetURL := config["target_url"] + filepath.Base(modifiedFname) + safeTargetURL := url.QueryEscape(targetURL) + safeTargetURL = strings.Replace(safeTargetURL, "%3A", ":", 1) + fmt.Println(safeTargetURL) + + if shortenAPI, ok := config["shorten_api"]; ok { + shortenedURLBytes, err := exec.Command("curl", "-s", shortenAPI+safeTargetURL).Output() + if err != nil { + log.Fatal(err) + } + shortenedURL := strings.TrimSpace(string(shortenedURLBytes)) + fmt.Println(shortenedURL) + } + + if COMPRESSED { + os.Remove(modifiedFname) + } else { + os.Rename(modifiedFname, fname) + } + } +} @@ -0,0 +1,114 @@ +#!/usr/bin/python3 +# +# "THE BEER-WARE LICENSE" (Revision 42): +# Matěj Cepl, <mcepl@cepl.eu> wrote this file. As long as you retain +# this notice you can do whatever you want with this stuff. If we meet +# some day, and you think this stuff is worth it, you can buy me a beer +# in return. Matěj Cepl +# +# Instructions for setting up appropriate folders on +# file.rdu.redhat.com see https://mojo.redhat.com/docs/DOC-14590 +# See also variable URL_VARIANTS below for necessary configuration. + +import logging +import os +import os.path +import subprocess +import sys +import urllib.parse +logging.basicConfig(format='%(levelname)s:%(funcName)s:%(message)s', + level=logging.INFO) + +COMPRESSED_EXTENSIONS = ['.jpg', '.webm', '.ogg', '.gz', '.bz2', '.zip', + '.xz', '.odp', '.odt', '.ods', '.mp4', '.mp3', + '.png', '.pdf', '.apk', '.rpm', '.epub', '.mkv', + '.avi', '.jar', '.opus', '.ogv', '.flac', '.msi', + '.m4a'] + +# THE VARIABLE URL_VARIANTS IS MEANT TO BE EDITED. Change various +# attributes to correspond to your situation. +# If new section is added (as here I have also 'tmp' for fedorapeople.org) +# and the script has a symlink named as the section (so I have tmp as +# a symlink to this script) it saves the file to the location defined in +# such section when called under the other name. +# So, `tmp filename` saves file filename to fedorapeople for me. +URL_VARIANTS = { + 'barstool': { + 'base_url': 'mcepl@shell.eng.rdu.redhat.com:public_html/', + 'target_url': 'http://file.rdu.redhat.com/~mcepl/', + 'shorten_api': 'https://url.corp.redhat.com/new?' + }, + 'wotan': { + 'base_url': 'wotan:Export/', + 'target_url': 'https://w3.suse.de/~mcepl/' + }, + 'tmp': { + 'base_url': 'fedorapeople.org:public_html/tmp/', + 'target_url': 'https://mcepl.fedorapeople.org/tmp/', + # 'shorten_api': 'http://is.gd/create.php?format=simple&url=' + 'shorten_api': 'https://da.gd/s?url=' + } +} + +cmdname = os.path.basename(sys.argv[0]).lower() +config = URL_VARIANTS[cmdname] + +if not os.path.exists(sys.argv[1]): + sys.exit(1) + +for processed_file in sys.argv[1:]: + st = os.stat(processed_file).st_size + file_ext = os.path.splitext(processed_file)[1] + logging.debug('size of {} is {:,}.'.format(processed_file, st)) + logging.debug('extension is {}'.format(file_ext)) + if st > 1000000 and file_ext.lower() not in COMPRESSED_EXTENSIONS: + logging.debug( + "file {} should be compressed ({:,} b)".format( + processed_file, st)) + try: + ret = subprocess.check_call( + ['gzip', '--keep', '--best', processed_file]) + except subprocess.CalledProcessError: + logging.error('Compressing the file failed!') + raise + fname = processed_file + '.gz' + COMPRESSED = True + else: + logging.debug( + 'file {} should not be compressed ({:,} b)'.format( + processed_file, st)) + fname = processed_file + COMPRESSED = False + + os.chmod(fname, 0o0644) + + # To make scp happy + modified_fname = fname.replace(':', '_') + os.rename(fname, modified_fname) + + logging.debug('Unshorten fname = {}'.format(modified_fname)) + + subprocess.check_call(['scp', '{}'.format(modified_fname), + config['base_url']]) + + target_URL = config['target_url'] + os.path.basename(modified_fname) + + # Make target_URL into a safe URL + safe_target_URL = urllib.parse.quote(target_URL).replace('%3A', ':', 1) + + print(safe_target_URL) + + # curl -s 'http://is.gd/create.php?format=simple&url=www.example.com' + # http://is.gd/MOgh5q + + if 'shorten_api' in config: + shortened_URL = subprocess.check_output( + ['curl', '-s', config['shorten_api'] + + safe_target_URL]).decode().strip() + print(shortened_URL) + + # The script should have no side-effects + if COMPRESSED: + os.unlink(modified_fname) + else: + os.rename(modified_fname, fname) |