Say I have a large txt or CSV file with data I want to search. And say I have several files.

What is the best way to index and make this data searchable? I’ve been using grep, but it is not ideal.

Is there any self hostable docker container for indexing and searching this? Or maybe should I use SQL?

  • h0bbl3s@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    4 months ago

    You can import CSV files directly into an SQLite database. Then you are free to do whatever sql searches or manipulations you want. Python and golang both have great SQLite libraries from personal experience, but I’d be surprised if there is any language that doesn’t have a decent one.

    If you are running Linux most distros have an SQLite gui browser in the repos that is pretty powerful.

    • Ephera@lemmy.ml
      link
      fedilink
      arrow-up
      0
      ·
      4 months ago

      I’d be surprised if there is any language that doesn’t have a decent one.

      Yeah, SQLite provides a library implemented in C. Because C doesn’t require a runtime, it’s possible for other languages to call into this C library. All you need is a relatively thin wrapper library, which provides an API that feels good to use in the respective language.

  • Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    4 months ago

    Excel / OnlyOffice?

    I love self-hosted tools, but you can do a lot on a spreadsheet.

    Btw, if the files are not too large, you can query them using SQL without even hosting a database just by using Pandas. This avoids the problem of updating entries and handling migrations in case the CSVs change over time.

    • morbidcactus@lemmy.ca
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      4 months ago

      Postgres runs well in a container in my experience and is nice to work with, def support that. I know sqlite works well, no complaints from me

  • megaman@discuss.tchncs.de
    link
    fedilink
    arrow-up
    0
    ·
    4 months ago

    Datasette is a neat tool intended to publish static data in a sqlite database on the web with a helpful gui and a bunch of extensions available. I havent come across a good enough reason to do it myself, but may do what you want.

    You can spin it up locally and it wont be on the web at all, just accessed via your browser if thats what you want.

  • Anna@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    4 months ago

    Depends on the size of data, use case like will you be doing any constant updates to it or just reading, you mentioned you have several files so do you need joins if so what will be an approx max number of joins you’ll be doing on per query basis, I guess you said CSV so I’m assuming it is structured data and not semi structured or unstructured.

    Few more questions, do you need a fast indexing but are not planning on doing any complex operations, areyoiu going to do a lot of OLTP operations and you need ACID. Or are you going OLAP route. are you planning on distributed database if so then which 2 do you want from CAP, do you want batch processing or stream processing,

    I’ve few dozen other questions also