aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [policies] Deal 200 return code as successPavel Moravec2021-11-241-1/+1
| | | | | | | | | | Return code 200 of POST method request must be dealt as success. Newly required due to the SFTP API change using POST. Related to: #2764 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [redhat] update SFTP API version to v2Pavel Moravec2021-11-241-5/+5
| | | | | | | | | | | Change API version from v1 to v2, which includes: - change of URL - different URI - POST method for token generation instead of GET Resolves: #2764 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [policies] strip path from SFTP upload filenamePavel Moravec2021-11-241-3/+3
| | | | | | | | | | | When case_id is not supplied, we ask SFTP server to store the uploaded file under name /var/tmp/<tarball>, which is confusing. Let remove the path from it also in case_id not supplied. Related to: #2764 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [collect] fix moved get_upload_url under Policy classPavel Moravec2021-11-241-1/+1
| | | | | | | | | SoSCollector does not further declare get_upload_url method as that was moved under Policy class(es). Resolves: #2766 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [clean, hostname] Fix unintentionally case sensitive shortname handlingJake Hunsaker2021-11-193-7/+28
| | | | | | | | It was discovered that our extra handling for shortnames was unintentionally case sensitive. Fix this to ensure that shortnames are obfuscated regardless of case in all collected text. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [clean,hostname_parser] Source /etc/hosts for obfuscationJake Hunsaker2021-11-193-4/+31
| | | | | | | | | | | | | | | | | | | | | | Up until now, our sourcing of hostnames/domains for obfuscation has been dependent upon the output of the `hostname` command. However, some scenarios have come up where sourcing `/etc/hosts` is advantageous for several reasons: First, if `hostname` output is unavailable, this provides a fallback measure. Second, `/etc/hosts` is a common place to have short names defined which would otherwise not be detected (or at the very least would result in a race condition based on where/if the short name was elsewhere able to be gleaned from an FQDN), thus leaving the potential for unobfuscated data in an archive. Due to both the nature of hostname obfuscation and the malleable syntax of `/etc/hosts`, the parsing of this file needs special handling not covered by our more generic parsing and obfuscation methods. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [nvidia]:Patch to update nvidia plugin for GPU infoMamatha Inamdar2021-11-171-2/+13
| | | | | | | | | | This patch is to update nvidia plugin to collect logs for Nvidia GPUs Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Reported-by: Borislav Stoymirski <borislav.stoymirski@bg.ibm.com> Reported-by: Yesenia Jimenez <yesenia@us.ibm.com>
* Collect "ip route cache" dataMichael Cambria2021-11-171-0/+2
| | | | Signed-off-by: Michael Cambria <mcambria@redhat.com>
* [presets] Optimise OCP preset for hundreds of network namespacesPavel Moravec2021-11-101-3/+7
| | | | | | | | | | | | | | | | | Sos report on OCP having hundreds of namespaces timeouts in networking plugin, as it collects >10 commands for each namespace. Let use a balanced approach in: - increasing network.timeout - limiting namespaces to traverse - disabling ethtool per namespace to ensure sos report successfully finish in a reasonable time, collecting rasonable amount of data. Resolves: #2754 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [report] Calculate sizes of dirs, symlinks and manifest in estimate modePavel Moravec2021-11-101-28/+28
| | | | | | | | | | | | | | | | | Enhance --estimate-mode to calculate sizes of also: - symlinks - directories themselves - manifest.json file Use os.lstat() method instead of os.stat() to properly calculate the sizes (and not destinations of symlinks, e.g.). Print five biggest plugins instead of three as sos logs and reports do stand as one "plugin" in the list, often. Resolves: #2752 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* Catch exceptions on user prompt for ENTER / Ctrl+CPavel Moravec2021-11-094-4/+11
| | | | | | | | | | | Catch unhandled EOFError in collector and cleaner. Update the behaviour in report that redundantly prints the error message twice. Resolves: #2751 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [sos] Reference TMPDIR environment variable and check fstype of tmpdirJake Hunsaker2021-11-091-4/+21
| | | | | | | | | | | | | | | | | | | | | If the `TMPDIR` env var is set, we should reference it if the user has not provided `--tmp-dir` by the cmdline or sos.conf. The order of precedence is now: 1. cmdline use of `--tmp-dir` 2. setting `tmp-dir` in `/etc/sos/sos.conf` 3. the `TMPDIR` environment variable 4. `/var/tmp` as a default Additionally, we will now check if the filesystem type for our tmpdir is tmpfs, and if so print a warning to the user about the potential pitfalls of doing so. This information is now recorded in the manifest as well. Closes: #2738 Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [report] fix filter_namespace per patternPavel Moravec2021-11-091-8/+7
| | | | | | | | | | | | Curently, -k networking.namespace_pattern=.. is broken as the R.E. test forgets to add the namespace in case of positive match. Also ensure both plugopts namespace_pattern and namespaces work together. Resolves: #2748 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [ceph_osd] Add more data collectionNikhil Kshirsagar2021-11-081-1/+51
| | | | | | | | | Enhance the ceph_osd plugin to collect more data specific to OSD nodes. Related: #1945 Resolves: #2735 Signed-off-by: Nikhil Kshirsagar <nkshirsagar@gmail.com>
* [Plugin] Ensure specific plugin timeouts are only set for that pluginJake Hunsaker2021-11-033-10/+55
| | | | | | | | | | | | | | | | | It was discovered that setting a specific plugin timeout via the `-k $plugin.timeout` option could influence the timeout setting for other plugins that are not also having their timeout explicitly set. Fix this by moving the default plugin opts into `Plugin.__init__()` so that each plugin is ensured a private copy of these default plugin options. Additionally, add more timeout data to plugin manifest entries to allow for better tracking of this setting. Adds a test case for this scenario. Closes: #2744 Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [report] shutdown threads for timeouted pluginsPavel Moravec2021-11-031-0/+1
| | | | | | | | | | Wait for shutting down threads of timeouted plugins, to prevent them in writing to moved auxiliary files like sos_logs/sos.log Resolves: #2722 Closes: #2746 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [plugins] remove py2/3 relict in __nonzero__ / __bool__Pavel Moravec2021-11-031-6/+1
| | | | | | Resolves: #2743 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [Plugin] Rework get_container_logs to be more usefulJake Hunsaker2021-11-032-6/+18
| | | | | | | | | | | `get_container_logs()` is now `add_container_logs()` to align it better with our more common `add_*` methods for plugin collections. Additionally, it has been extended to accept either a single string or a list of strings like the other methods, and plugin authors may now specify either specific container names or regexes. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [sosnode] Fix typo and small logic breakJake Hunsaker2021-10-292-3/+5
| | | | | | | | | Fixes a typo in setting the non-primary node options from the ocp profile against the sosnode object. Second, fixes a small break in checksum handling for the manifest discovered during `oc` transport testing for edge cases. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [ocp] Create temporary project and restrict default node list to mastersJake Hunsaker2021-10-294-6/+64
| | | | | | | | | | | | | | | Adds explicit setup of a new project to use in the `ocp` cluster and adds better handling of cluster setup generally, which the `ocp` cluster is the first to make use of. Included in this change is a correction to `Cluster.exec_primary_cmd()`'s use of `get_pty` to now be determined on if the primary node is the local node or not. Additionally, based on feedback from the OCP engineering team, by default restrict node lists to masters. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [transports] Add 'oc' as a transport option for remote nodesJake Hunsaker2021-10-295-6/+257
| | | | | | | | | | | | | | | | | This commit adds a new transport for `sos collect` by leveraging a locally available `oc` binary that has been properly configured for access to an OCP cluster. This transport will allow users to use `sos collect` to collect reports from an OCP cluster without directly connecting to any of the nodes involved. We do this by using the `oc` binary to first launch a pod on target node(s) and then exec our discovery commands and eventual `sos report` command to that pod. This in turn is dependent on a function API for the `oc` binary to communicate with. In the event that `oc` is not __locally__ available or is not properly configured, we will fallback to the current default of using SSH ControlPersist to directly connect to the nodes. Otherwise, the OCP cluster will attempt to automatically use this new transport.
* [collect] Add --transport option and allow clusters to set transport typeJake Hunsaker2021-10-295-2/+58
| | | | | | | | | | | | | | | Adds a new `--transport` option for users to be able to specify the type of transport to use when connecting to nodes. The default value of `auto` will defer to the cluster profile to set the transport type, which will continue to default to use OpenSSH's ControlPersist feature. Clusters may override the new `set_transport_type()` method to change the default transport used. If `--transport` is anything besides `auto`, then the cluster profile will not be deferred to when choosing a transport for each remote node. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [convert2rhel] Add archived log collectionRodolfo Olivieri2021-10-271-1/+2
| | | | | | | Convert2RHEL will now archive old logs to maintain the sake of simplicity, and for that, we are including the archive directory to be collected as well. Signed-off-by: Rodolfo Olivieri <rolivier@redhat.com>
* [foreman] Collect 'scl enable tfm gem list' againPavel Moravec2021-10-261-3/+5
| | | | | | | | | This was stopped to be collected in foreman plugin split. Related: #2730 Resolves: #2731 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [report] Fill scls_matched property completelyPavel Moravec2021-10-261-2/+2
| | | | | | | | | | scls_matched property needs to be generated by whole loop execution. Therefore we can return some SCL was found even after the loop. Related: #2730 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [report] Dont overwrite PATH within scl commandsPavel Moravec2021-10-261-17/+2
| | | | | | | | | "scl enable .." injects the SCL sub-path by itself, we are doing a redundant step that could even harm. Related: #2730 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [hpssm] Add HP Smart Storage Management plugin supportVikas Goel2021-10-261-0/+46
| | | | Signed-off-by: Vikas Goel <vikas.goel@gmail.com>
* [firewall_tables] Call iptables only when nft ip filter table existsPavel Moravec2021-10-251-12/+14
| | | | | | | | | | | iptables -vnxL creates nft 'ip filter' table if it does not exist, hence we must guard iptables execution by presence of the nft table. An equivalent logic applies to ip6tables. Resolves: #2724 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [firewall_tables] call iptables -t <table> based on nft listPavel Moravec2021-10-251-7/+22
| | | | | | | | | | | | | | | | | | | | | | If iptables are not realy in use, calling iptables -t <table> would load corresponding nft table. Therefore, call iptables -t only for the tables from "nft list ruleset" output. Example: nft list ruleset contains table ip mangle { .. } so we can collect iptable -t mangle -nvL . The same applies to ip6tables as well. Resolves: #2724 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [report] Use log_skipped_cmd method inside collect_cmd_outputPavel Moravec2021-10-251-18/+8
| | | | | | | | Also, remove obsolete parameters of the log_skipped_cmd method. Related: #2724 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [report] check for symlink before rmtree when opt estimate-only is useEric Desrochers2021-10-191-1/+1
| | | | | | | | | | | | | | | | Check if the dir is also symlink before performing rmtree() method so that unlink() method can be used instead. Traceback (most recent call last): File "./bin/sos", line 22, in <module> sos.execute() File "/tmp/sos/sos/__init__.py", line 186, in execute self._component.execute() OSError: Cannot call rmtree on a symbolic link Closes: #2727 Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com>
* [plugins] Update plugins to use new os.path.join wrapperJake Hunsaker2021-10-1837-115/+115
| | | | | | | | Updates plugins to use the new `self.path_join()` wrapper for `os.path.join()` so that these plugins now account for non-/ sysroots for their collections. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [Plugin,utilities] Add sysroot wrapper for os.path.joinJake Hunsaker2021-10-182-21/+28
| | | | | | | | Adds a wrapper for `os.path.join()` which accounts for non-/ sysroots, like we have done previously for other `os.path` methods. Further updates `Plugin()` to use this wrapper where appropriate. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [tests] Increase stagetwo log test timeoutJake Hunsaker2021-10-181-0/+1
| | | | | | | | | | | As an interim stopgap measure, increase the timeout for the stagetwo `logs` test to allow for more time for handling random data generation and logging, until we're able to define a better/more efficient way to generate this data within the test suite. Related: #2700 Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [tests] Enable verbosity by default for all testsJake Hunsaker2021-10-183-3/+3
| | | | | | | | | | The debug level messages gated by `-v` are very helpful for diagnosing test failures, but currently not all tests specify the use of verbosity. Make use of verobsity a default parameter for all test runs to address this. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [report] Count with sos_logs and sos_reports in --estimate-onlyPavel Moravec2021-10-141-0/+8
| | | | | | | | | | Currently, we estimate just plugins' disk space and ignore sos_logs or sos_reports directories - although they can occupy nontrivial disk space as well. Resolves: #2723 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [ata/nvme] Include json formatted for smartctlPonnuvel Palaniyappan2021-10-122-1/+4
| | | | | | Closes: #2720 Signed-off-by: Ponnuvel Palaniyappan <pponnuvel@gmail.com>
* [openvswitch] add commands for offline analysisSalvatore Daniele2021-10-112-2/+11
| | | | | | | | | | | | | | | | | | | | Replicas of ovs-vswitchd and ovsdb-server can be recreated offline using flow, group, and tlv dumps, and ovs conf.db. This allows for offline anaylsis and the use of tools such as ovs-appctl ofproto/trace and ovs-ofctl for debugging. This patch ensures this information is available in the sos report. The db is copied rather than collected using ovsdb-client list dump for two reasons: ovsdb-client requires interacting with the ovsdb-server which could take it 'down' for some time, and impact large, busy clusters. The list-dump is not in a format that can be used to restore the db offline. All of the information in the list dump is available and more by copying the db. Signed-off-by: Salvatore Daniele <sdaniele@redhat.com>
* [openvswitch] add ovs default OpenFlow protocolsSalvatore Daniele2021-10-111-0/+26
| | | | | | | | | | | | | | ovs-vsctl list bridge can return an empty 'protocol' column even when there are OpenFlow protocols in place by default. ovs-ofctl --version will return the range of supported ofp and should also be used to ensure flow information for relevant protocol versions is collected. OpenFlow default versions: https://docs.openvswitch.org/en/latest/faq/openflow/ Signed-off-by: Salvatore Daniele <sdaniele@redhat.com>
* [tests] Run unit tests under avocado instead of noseJake Hunsaker2021-10-114-48/+6
| | | | | | | | | | | | | | | | | | | | | | | `nose` is no longer maintained, and as of python-3.10 is functionally broken. As such, instead transition to running those tests via avocado, like we do with our integration test suite. The tests themselves do not need much modification, however due to the isolation provided for executing the tests we do need to explicitly set a new PYTHONPATH env var for those executions. This means we still need to run the unit tests as a separate step from the stageone tests. The changes needed are mostly around file paths relative to the pwd where the tests are executed from originally. Additionally, remove the sosreport_pexpect unit test as it is no longer useful in its own right, would need more significant changes to run properly with avocado, and the integration test suite provides better coverage for what it was testing. Closes: #2716 Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [networking] retrieve devlink port attributesAntoine Tenart2021-10-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | When using sub-functions[1] gathering the devlink port attributes does provide value when debugging. If there is no devlink port, the output is empty. Example: pci/0000:04:00.0/65535: type eth netdev enp4s0f0 flavour physical port 0 splittable false pci/0000:04:00.0/32768: type eth netdev en4f0pf0sf42 flavour pcisf controller 0 pfnum 0 sfnum 42 splittable false function: hw_addr 00:00:00:00:88:88 state active opstate attached pci/0000:04:00.0/32769: type eth netdev en4f0pf0sf1 flavour pcisf controller 0 pfnum 0 sfnum 1 splittable false function: hw_addr 00:00:00:00:00:00 state active opstate attached auxiliary/mlx5_core.sf.4/131072: type eth netdev enp4s0f0s42 flavour virtual port 0 splittable false auxiliary/mlx5_core.sf.5/196608: type eth netdev enp4s0f0s1 flavour virtual port 0 splittable false [1] https://www.kernel.org/doc/html/latest/networking/devlink/devlink-port.html#subfunction Signed-off-by: Antoine Tenart <atenart@kernel.org>
* [report] Overwrite pred=None before refering predicate attributesPavel Moravec2021-10-061-0/+2
| | | | | | | | | | During a dry run, add_journal method sets pred=None whilst log_skipped_cmd refers to predicate attributes. In that case, replace None predicate by a default / empty predicate. Resolves: #2711 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [gcp] Adding Google Cloud pluginMaciej Strzelczyk2021-10-061-0/+145
| | | | | | | | | | | | | | | | | Adding a plugin for Google Cloud Compute Engine VMs. The plugin will collect data about Google services running on the system (journalctl -u google*), data from the Metadata Server that's available for every instance and output of `gcloud auth list` - if available. Available option: keep-pii - if set, the plugin won't remove the project name and project number from the metadata.json file. Closes #2699 Signed-off-by: Maciej Strzelczyk <strzelczyk@google.com>
* [insights] collect connection check command outputPavel Moravec2021-10-051-0/+5
| | | | | | | | | | Collect 'insights-client --test-connection --net-debug' cmdout with a limited timeout to prevent plugin stuck for too long in case of a networking/proxy issue. Resolves: #2704 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [foreman] Collect puma status and statsPavel Moravec2021-10-051-1/+20
| | | | | | | | | Collect foreman-puma-status and 'pumactl [gc-|]stats', optionally using SCL (if detected). Resolves: #2712 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [Policy] Fix assignment of case id promptJake Hunsaker2021-09-301-4/+5
| | | | | | | | | | | | | | | | When prompting for a case id, `Policy` was not properly updating the option value, only assigning the value to `Policy` which meant that aspects outside of `Policy` could not always properly reference the (updated) case id. Fix this by assigning the case id prompt response back to the case_id option value. `Policy` still retains a local reference to case_id as existing logic was setting that based on the (assumed-to-be-updated) option value, which until this commit would have been superfluous. Closes: #2707 Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [networking] prevent iptables-save commands to load nf_tables kmodPavel Moravec2021-09-301-5/+24
| | | | | | | | | | | | If iptables has built-in nf_tables kmod, then 'ip netns <foo> iptables-save' command requires the kmod which must be guarded by predicate. Analogously for ip6tables. Resolves: #2703 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
* [Systemd, Policy] Correct InitSystem chrooting when chroot is neededJake Hunsaker2021-09-304-32/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | This commit resolves a situation in which `sos` is being run in a container but the `SystemdInit` InitSystem would not properly load information from the host, thus causing the `Plugin.is_service*()` methods to erroneously fail or return `False`. Fix this scenario by pulling the `_container_init()` and related logic to check for a containerized host sysroot out of the Red Hat specific policy and into the base `LinuxPolicy` class so that the init system can be initialized with the correct sysroot, which is now used to chroot the calls to the relevant `systemctl` commands. For now, this does impose the use of looking for the `container` env var (automatically set by docker, podman, and crio regardless of distribution) and the use of the `HOST` env var to read where the host's `/` filesystem is mounted within the container. If desired in the future, this can be changed to allow policy-specific overrides. For now however, this extends host collection via an sos container for all distributions currently shipping sos. Note that this issue only affected the `InitSystem` abstraction for loading information about local services, and did not affect init system related commands called by plugins as part of those collections. Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [collect] Abstract transport protocol from SoSNodeJake Hunsaker2021-09-2713-344/+705
| | | | | | | | | | | | | | | | | | Since its addition to sos, collect has assumed the use of a system installation of SSH in order to connect to the nodes identified for collection. However, there may be use cases and desires to use other transport protocols. As such, provide an abstraction for these protocols in the form of the new `RemoteTransport` class that `SoSNode` will now leverage. So far an abstraction for the currently used SSH ControlPersist function is provided, along with a psuedo abstraction for local execution so that SoSNode does not directly need to make more "if local then foo" checks than are absolutely necessary. Related: #2668 Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
* [origin] Removing unused configurationVladislav Walek2021-09-241-34/+19
| | | | | | | | | | | | | | | Removed the diagnostics part as it is no longer maintained and doesn't work on Openshift. Adding additional projects to collect. Removed getting all namespaces as it is not needed for troubleshooting and project names are sensitive for some customers. Adding condition to collect the logs from systemd openshift services if not running as static pods. Signed-off-by: Vladislav Walek <22072258+vwalek@users.noreply.github.com>