author Avator

hero Image Blog

Building a Powerful Text Search Application with Python and Tkinter

Blog Web-Development

Written by

Author avatar

Abdul Rafay

Share this Blog Post:

WhatsAppTwitterFacebookLinkedIn

In today’s data-driven world, efficiently searching through large volumes of text is crucial. Whether you’re a researcher looking through academic papers or a developer managing logs, having a robust text search tool can save you time and effort. In this post, I’ll walk you through the process of building a text search application using Python and Tkinter, complete with powerful search algorithms and user-friendly features.

Overview of the Application

Our application allows users to search for specific patterns in text files using two different algorithms: Brute Force and Knuth-Morris-Pratt (KMP). The user interface, built with Tkinter, is designed to be intuitive, allowing users to input search criteria, select algorithms, and view results efficiently.

Key Features

  1. Dual Search Algorithms: Choose between Brute Force and KMP algorithms based on your needs.
  2. Case Sensitivity: Option to perform case-sensitive searches.
  3. Whole Word Matching: Ensure that the search pattern matches whole words only, avoiding partial matches.
  4. Results Display: The application displays file names, row indices, positions of matches, and the time taken for the search.

Code Walkthrough

1. Importing Libraries

We start by importing the necessary libraries:

import pandas as pd
import glob
import time
import tkinter as tk
from tkinter import ttk, messagebox
  • pandas: For data manipulation and managing text file content.
  • glob: For file handling and pattern matching to read specific text files.
  • time: To measure the performance of the search operations.
  • tkinter: For creating a graphical user interface (GUI).

2. Brute Force Search Algorithm

Here’s the implementation of the Brute Force algorithm:

def brute_force_search(text, pattern):
    n = len(text)
    m = len(pattern)
    locations = []

    for i in range(n - m + 1):
        j = 0
        while j < m and text[i + j] == pattern[j]:
            j += 1
        if j == m:
            locations.append(i)

    return locations

Explanation:

  • This function takes a text and a pattern as input.
  • It iterates through the text to check for matches with the pattern.
  • If a match is found, the starting index is stored in the locations list.

3. KMP Search Algorithm

The KMP algorithm is more efficient. Here’s how it works:

def compute_lps(pattern):
    m = len(pattern)
    lps = [0] * m
    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
    return lps

def kmp_search(text, pattern):
    n = len(text)
    m = len(pattern)
    lps = compute_lps(pattern)
    locations = []
    i = 0
    j = 0

    while i < n:
        if pattern[j] == text[i]:
            i += 1
            j += 1
        if j == m:
            locations.append(i - j)
            j = lps[j - 1]
        elif i < n and pattern[j] != text[i]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return locations

Explanation:

  • compute_lps: Preprocesses the pattern to create a longest prefix-suffix (LPS) array.
  • kmp_search: Uses the LPS array to skip unnecessary comparisons, improving search efficiency.

4. Searching the DataFrame

The function to search through the DataFrame containing text file content:

def search_dataframe(df, pattern, algorithm='brute_force', case_sensitive=False, whole_word=False):
    results = []

    if not pattern:
        raise ValueError("Search pattern cannot be empty.")

    if not case_sensitive:
        pattern = pattern.lower()

    search_func = brute_force_search if algorithm == 'brute_force' else kmp_search

    for index, row in df.iterrows():
        filename = row['Filename']
        content = row['Content']

        if not case_sensitive:
            content = content.lower()

        start_time = time.time()
        try:
            positions = search_func(content, pattern)
        except Exception as e:
            raise RuntimeError(f"Error while searching in file {filename}: {str(e)}")

        end_time = time.time()

        if whole_word:
            positions = [pos for pos in positions if (pos == 0 or not content[pos - 1].isalnum()) and
                         (pos + len(pattern) == len(content) or not content[pos + len(pattern)].isalnum())]

        for pos in positions:
            results.append({
                'Filename': filename,
                'Row': index,
                'Column (position)': pos,
                'Time Taken (s)': round(end_time - start_time, 4)
            })

    return results

Explanation:

  • This function takes a DataFrame df, a pattern, and additional parameters for case sensitivity and whole word matching.
  • It iterates through each file in the DataFrame, applies the chosen search algorithm, and records the results.

5. User Interface and Event Handling

We create the GUI and handle user inputs as follows:

def run_search():
    search_text = search_entry.get()
    algorithm = algorithm_choice.get()
    case_sensitive = case_sensitive_var.get()
    whole_word = whole_word_var.get()

    if not search_text:
        messagebox.showerror("Input Error", "Please enter the search text.")
        return

    try:
        results = search_dataframe(df, search_text, algorithm, case_sensitive, whole_word)
        if results:
            output_text.delete(1.0, tk.END)
            output_text.insert(tk.END, "Search Results:\n")
            output_text.insert(tk.END, "-"*50 + "\n")

            for result in results:
                output_text.insert(tk.END, f"Filename: {result['Filename']}\n")
                output_text.insert(tk.END, f"Row: {result['Row']}\n")
                output_text.insert(tk.END, f"Position (Column): {result['Column (position)']}\n")
                output_text.insert(tk.END, f"Time Taken: {result['Time Taken (s)']} seconds\n")
                output_text.insert(tk.END, "-"*50 + "\n")
        else:
            output_text.delete(1.0, tk.END)
            output_text.insert(tk.END, "No matches found.\n")

    except ValueError as ve:
        messagebox.showerror("Search Error", str(ve))
    except RuntimeError as re:
        messagebox.showerror("Search Error", str(re))
    except Exception as e:
        messagebox.showerror("Unexpected Error", f"An unexpected error occurred: {str(e)}")

Explanation:

  • The run_search function is triggered when the user clicks the “Search” button.
  • It retrieves the user’s input, runs the search, and displays results in the output text widget.

6. Loading Files and Creating the Main Window

Finally, we load the text files into a DataFrame and create the main window:

file_pattern = "Research#*.txt"
data = {'Filename': [], 'Content': []}

try:
    files_found = glob.glob(file_pattern)
    if not files_found:
        raise FileNotFoundError(f"No files found matching pattern {file_pattern}")

    for filepath in files_found:
        try:
            with open(filepath, 'r', encoding='utf-8') as file:
                data['Filename'].append(filepath)
                data['Content'].append(file.read())
        except Exception as e:
            raise IOError(f"Error reading file {filepath}: {str(e)}")

    df = pd.DataFrame(data)
    if df.empty:
        raise ValueError("No valid content found in files.")

except FileNotFoundError as fnf_error:
    messagebox.showerror("File Error", str(fnf_error))
except IOError as io_error:
    messagebox.showerror("File Error", str(io_error))
except Exception as e:
    messagebox.showerror("Unexpected Error", f"An unexpected error occurred: {str(e)}")

root = tk.Tk()
root.title("Text Search App")

tk.Label(root, text="Enter search text:").pack(pady=10)
search_entry = tk.Entry(root, width=50)
search_entry.pack(pady=5)

tk.Label(root, text="Choose algorithm:").pack(pady=

10)
algorithm_choice = ttk.Combobox(root, values=["brute_force", "kmp"])
algorithm_choice.pack(pady=5)
algorithm_choice.current(0)

case_sensitive_var = tk.BooleanVar()
whole_word_var = tk.BooleanVar()
tk.Checkbutton(root, text="Case Sensitive", variable=case_sensitive_var).pack(pady=5)
tk.Checkbutton(root, text="Whole Word Matching", variable=whole_word_var).pack(pady=5)

search_button = tk.Button(root, text="Search", command=run_search)
search_button.pack(pady=20)

output_text = tk.Text(root, width=70, height=20)
output_text.pack(pady=10)

root.mainloop()

Explanation:

  • We use glob to find all text files matching the specified pattern.
  • Each file’s content is loaded into a DataFrame.
  • The Tkinter window is created with widgets for user input, algorithm selection, checkboxes for options, and a button to initiate the search.

Conclusion

This text search application demonstrates how to combine Python’s powerful libraries with a user-friendly interface to create a practical tool for searching through text files. With options for different search algorithms and customizable settings, it can be tailored to meet various user needs.

Feel free to clone the code and modify it to fit your requirements! Happy coding!

Comments