Hey fellow devs!
Today, I’m excited to share a cool project I recently wrapped up—a Text File Concatenator built using Python and PyQt5. Whether you’re into software engineering, data analysis, or AI, this tool might just find a place in your workflow. Let’s dive into what this program does, why I built it, and the journey I took to bring it to life.
What Does the Text File Concatenator Do?
At its core, the Text File Concatenator is a user-friendly GUI application that allows you to:
- Select multiple files or directories containing text files.
- Specify preferences like which file types to include, directories to ignore, and whether to consider only Git-tracked files.
- Concatenate the selected text files into a single output file.
- Choose output options such as saving the concatenated content to a file, copying it to the clipboard, or both.
- Preview the results and view any errors that occurred during the process.
Think of it as a streamlined way to unify code snippets, logs, documentation, or any collection of text files into one consolidated document. It’s like having a Swiss Army knife for your text files!
Why Did I Build This Tool?
The motivation behind this project stemmed from a common challenge I faced: organizing and managing multiple text files efficiently. Whether it's combining logs from different runs, aggregating code snippets for analysis, or simply keeping documentation tidy, manually handling numerous files can be tedious and error-prone.
Here’s what inspired me:
- Efficiency: Automate the mundane task of concatenating files to save time.
- Customization: Offer flexible options to include/exclude specific files or directories based on patterns.
- Integration with Git: Ensure that only Git-tracked files are considered, which is especially useful in collaborative environments.
- User-Friendly Interface: Make it accessible for those who might not be comfortable with command-line operations.
Breaking Down the Key Components
Let’s unpack the main components that make this application tick:
1. FileConcatenatorThread Class
This is the heart of the application’s functionality, handling the heavy lifting of file processing in a separate thread to keep the GUI responsive.
class FileConcatenatorThread(QThread):
progress_update = pyqtSignal(int, str) # Emit progress percent and current file
status_update = pyqtSignal(str)
error_occurred = pyqtSignal(str)
finished_successfully = pyqtSignal(str, list, list) # Emit concatenated text, list of files, error list
...
- Signals: Communicate progress, status updates, errors, and completion back to the main GUI thread.
- Run Method: Traverses selected directories, filters files based on user preferences, reads content, and concatenates them.
- Git Integration: Checks if files are tracked by Git if that option is selected.
2. Qt GUI Sections
The GUI is thoughtfully divided into three main tabs:
-
Selection Tab: Allows users to browse and select files or directories using a tree view with lazy loading for performance.
-
Preferences Tab: Users can set preferences like ignore patterns, file types to include, and whether to focus on Git-tracked files.
-
Output Tab: Choose how to handle the concatenated output—save to a file, copy to clipboard, or both—and preview the results.
3. Managing User Preferences
User preferences are handled with flexibility:
- Ignore Patterns: Users can specify directories or file patterns to exclude from the concatenation process.
- File Types: A list of default and custom file extensions that determine which files are included.
- Configuration Persistence: Preferences are saved to a
config.json
file, ensuring settings persist across sessions.
def load_config(self):
if os.path.exists(CONFIG_FILE):
try:
with open(CONFIG_FILE, 'r') as f:
config = json.load(f)
self.directory_ignore_patterns = config.get('directory_ignore_patterns', self.directory_ignore_patterns)
self.file_ignore_patterns = config.get('file_ignore_patterns', self.file_ignore_patterns)
custom_filetypes = config.get('custom_filetypes', [])
self.default_file_extensions.extend(custom_filetypes)
except Exception:
pass # Ignore config errors
Potential Use Cases
This tool isn't just a nifty developer gadget—it has practical applications across various domains:
- AI/Data Analysis: Aggregate datasets or logs from multiple sources into a single file for streamlined processing.
- Documentation: Combine markdown or text documentation files into one comprehensive guide.
- Code Management: Consolidate code snippets or scripts from different projects for easy reference or deployment.
- Log Aggregation: Merge logs from different services or runs for easier monitoring and debugging.
Challenges and Lessons Learned
No project is without its hurdles, and this one taught me a few valuable lessons:
1. Handling Large Directories
Challenge: Navigating and loading large directory structures efficiently without freezing the GUI.
Solution: Implemented lazy loading in the tree view, where child items are only loaded when a parent item is expanded. This significantly improved performance and user experience.
def handle_item_expanded(self, item):
if item.childCount() == 1 and item.child(0).text(0) == "Loading...":
item.takeChildren() # Remove dummy
self.add_children(item, item.text(1))
2. Ensuring Thread Safety
Challenge: Updating the GUI from a separate thread can lead to unpredictable behavior.
Solution: Utilized PyQt’s signal-slot mechanism to safely communicate between the worker thread (FileConcatenatorThread
) and the main GUI thread.
3. Cross-Platform Git Integration
Challenge: Ensuring that Git commands work seamlessly across different operating systems.
Solution: Used Python’s subprocess
module to interact with Git, handling potential exceptions and ensuring that the application gracefully handles scenarios where Git isn’t installed or a repository isn’t found.
def is_git_tracked(self, filepath):
try:
# Get the repository root
repo_root = self.get_git_repo_root(filepath)
if not repo_root:
return False
# Get the relative path to the repo root
rel_path = os.path.relpath(filepath, repo_root)
# Check if the file is tracked
result = subprocess.run(['git', 'ls-files', '--error-unmatch', rel_path],
cwd=repo_root,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True)
return result.returncode == 0
except Exception:
return False
Future Enhancements
While the current version is robust, there’s always room for improvement. Here are some advanced features I’m considering:
- Multi-Language Support: Adding support for more programming languages by recognizing language-specific file types.
- Real-Time Monitoring: Automatically update the concatenated output as files change.
- Advanced Filtering: Implement regex-based filters for more granular control over included/excluded files.
- Integration with Cloud Services: Allow saving the output directly to cloud storage platforms like AWS S3 or Google Drive.
- Plugin System: Enable third-party developers to add custom functionalities through plugins.
Showcasing My Software Engineering Skillset
This project is a testament to my commitment to building well-structured, maintainable, and user-centric tools. Here’s how it reflects my skills:
- GUI Development with PyQt5: Demonstrates proficiency in creating intuitive and responsive user interfaces.
- Multithreading: Efficiently manages background processes to ensure a smooth user experience.
- Configuration Management: Implements persistent settings using JSON, showcasing attention to user preferences and data management.
- Git Integration: Highlights my ability to interact with version control systems programmatically.
- Error Handling: Ensures robustness by gracefully managing exceptions and providing meaningful feedback to users.
- Best Practices: Follows coding standards, modular design, and clear documentation, making the codebase easy to navigate and maintain.
Moreover, the tool seamlessly ties into data workflows, allowing for efficient data aggregation and preparation, which are crucial in AI and data engineering tasks.
Wrapping Up
Building the Text File Concatenator was an enriching experience that honed my skills in Python, PyQt, and software design principles. It addressed a real-world problem I encountered and transformed it into a practical solution that can benefit others in the developer community.
Whether you’re looking to streamline your documentation, manage code snippets, or aggregate data logs, I hope this tool proves useful. Feel free to check out the GitHub repository (link to repo) and give it a try!
Download the Text File Concatenator
You can download the executable version of the Text File Concatenator directly from the GitHub Releases page.
Alternatively, click the button below to get the latest version:
— Michael