Ward Pond's SQL Server blog

Ruminating on issues pertinent to the design and development of sound databases and processes under Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 (while reserving the right to vent about anything else that's on my mind)

Database Programming: Denis' Prime Number Challenge

Database Programming: Denis' Prime Number Challenge

  • Comments 11
  • Likes

'Denis the SQL Menace' read Wednesday night's post on "the million GUID march" and offered up a challenge of his own:

How about the next challenge is to return all 78498 prime numbers between 1 and 1000000?

What an interesting idea!  Crazy, in a sense (I'll get to that in a second), like most interesting ideas, but it's an idea I'm just crazy enough to explore for a couple of evenings.  In the process of that exploration I've so far learned about:

  • calculated columns, and when to persist them (in general, apply the PERSISTED directive to a calculated column only if you need to create an index on it)
  • INSTEAD OF INSERT triggers, a cool new SQL Server 2005 feature
  • I re-used Shaun's Common Table Expression, which I still think is groovy.  I'm using it even though it's not the fastest syntax available because I want to keep this code small from a physical resource/transaction log perspective, and I believe Shuan's code best accomplishes that goal.
  • I wrote the first GROUP BY.. HAVING I've written in about three years.

Do NOT Try This At Home

Before we even start, I'd be remiss if I didn't strongly state that SQL Server is exactly the wrong tool to use if you want to solve this issue in the real world.  I'm not a C# guy, but several reliable sources tell me that there's a prime number function in most nGL languages.  Even if there's not, all of the database access in the test is simply about performing the calculations necessary to determine primacy as quickly as possible.

This very easily could and absolutely should be written independent of the database tier.  This is a purely academic exercise (at least until I get two orders of magnitude performance better than I've got now!); to implement this code in its current state in any production environment would be highly irresponsible.

I'm also not a mathemetician, so it's possible that there's a far more elegant method that I could employ.  If you've got one, Denis and I would love to hear it.

Finally, in the words of David Letterman, "Please remember, ladies and gentlemen: this is only an exhibition.  No wagering."

The results I'm sharing should be considered preliminary in that I've yet to run the job to completion (that's a polite way of saying, "it's still running.").  This will take more than a day to run on my laptop (HP nc8430; 2gB RAM).

The performance scale is actually a little distressing (the further it gets into the problem, the slower it goes).  I'll keep tweaking it (and I've got an alternative to the cursor I'm working on as well), but I wanted to get this up on the blog, because I've seen enough of the results to know that it will be correct when (if J) it finishes.  I reserve the right to kill the run (seven hours in as of this writing) and restart.

On With The Show

We're going to apply the lessons learned in the million GUID march, which taught us that we should avoid procedural code wherever possible.  I've already gotten this down from where it would've probably taken ten days to run to the current 1-2 days (estimated; I'll update here when the results are complete).  

At any rate, the first thing we've got to do is build a database.  The CREATE DATABASE statement that SQLMS scripted out for me is pretty verbose; the full text is in a comment to this post here.  Basically what we've got here is a 3gB database with a 5gB log.  If you've got multiple spindles, I'd use them were I you:

CREATE DATABASE [PrimeNumberTest] ON  PRIMARY (

    NAME        = N'PrimeNumberTest',

    FILENAME    = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\PrimeNumberTest.mdf',

    SIZE        = 3GB,

    MAXSIZE     = UNLIMITED,

    FILEGROWTH  = 1GB )

 LOG ON (

    NAME        = N'PrimeNumberTest_log',

    FILENAME    = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\PrimeNumberTest_log.ldf',

    SIZE        = 5GB,

    MAXSIZE     = 2048GB,

    FILEGROWTH  = 1GB )

Once the database is created, we're ready to do the work.  This is a pretty long script; I've interspersed comments throughout the indicate what my thought process is.

This is a multiple-batch solution which might lend itself to folding into a UDF to determine the primacy of a particular number (my next project, and probably of more actual utility than the current actual exercise).

I made several attempts to duplicatie Shaun's approach, but because we have a different task here -- we have to essentially do math on half of a 1M x 1M Cartesian product, rather than simply generate 1M rows to drive an INSERT -- memory and disk space simply became an issue.  So, while we can be set-oriented in a lot of this work, due to hardware limitations I'm going to have to get procedural at one point in the process (if any of you reading this have big enough iron to test the Cartesian product method, please let me know!).

Without further ado, here's the cursor-based script.  It's extensively commented; the comments should serve as an extension of the narrative of this post.

It's really late and I'm going to tweak the cursor alternative a little more (it's looking like a dry well in its current state, but you never know..) and then let whichever alternative strikes me as currently more performant run overnight.  I'll report back on my progress on Monday at the latest.

Thanks for your interest!

     -wp

The script:

USE PrimeNumberTest

GO

 

--  log a status message

SELECT GETDATE() AS [Starting..]

SET NOCOUNT ON

GO

 

--  remove the three tables we're going to use

--  if they're present from a previous run

IF EXISTS (SELECT 1 FROM sys.objects WHERE name = 'SetBuilderDecimal')

    DROP TABLE SetBuilderDecimal

GO

IF EXISTS (SELECT 1 FROM sys.objects WHERE name = 'Candidate')

    DROP TABLE Candidate

GO

IF EXISTS (SELECT 1 FROM sys.objects WHERE NAME = 'ComparisonMatrix')

    DROP TABLE ComparisonMatrix

GO

 

--  create the table to hold the set of values between 1 and 1M

CREATE TABLE SetBuilderDecimal (

    Id DECIMAL(12,2) PRIMARY KEY

)

go

 

--  create the table to hold the set of values to test from primacy

--  (we are going to remove some low-hanging fruit for performance reasons,

--      so we need a distinct table for this purpose) 

CREATE TABLE Candidate (

    CandidateValue  DECIMAL(12,2)

)   

GO

 

--  create the table where the calculations will be performed

--      Base is the number we're testing for primacy

--      Factor is the number we're dividing it by

--      Quotient is the result of the division

--      IsQuotientInteger is a bit flag indicating whether the Quotient is an integer

CREATE TABLE ComparisonMatrix (

    Base DECIMAL(12,2),

    Factor DECIMAL(12,2),

    Quotient AS Base/Factor,

    IsQuotientInteger AS CASE

        WHEN Base/Factor = CAST(Base/Factor AS int) THEN CAST(1 AS int)

        ELSE CAST(0 AS int)

    END           PERSISTED

)   

 

--  build indexes on the ComparisonMatrix table to improve perfomance

CREATE UNIQUE CLUSTERED INDEX ComparisonMatrix_PK ON ComparisonMatrix(Base, Factor)

CREATE INDEX ComparisonMatrix_ID1 ON ComparisonMatrix(IsQuotientInteger)

GO

 

--    build an INSTEAD OF INSERT trigger so we only save the integer factors

--      THIS IS PURELY A PERFORMANCE ISSUE;

--      as discussed below 

--  this syntax is new in SQL Server 2005; the BOL reference may be found here:

--  ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/0001f651-fcfa-401b-baf2-0ec6b1671116.htm

--

--  A BRIEF DISCUSSION:

--  what we're doing here is tuning our INSERT statement for the table CompressionMatrix below to ignore

--      the transactions which do not result in an integer quotient.  This will make the commiting of

--      these INSERT transactions orders of magnitude faster, since only a small fraction of the volume

--      will result in integer quotients

CREATE TRIGGER IOI_ComparisonMatrix

ON    ComparisonMatrix

INSTEAD OF INSERT

AS

INSERT      ComparisonMatrix (

            Base,

            Factor

)

SELECT      Base,

            Factor

FROM  inserted

WHERE IsQuotientInteger = 1

GO

 

--  log a status message

SELECT GETDATE() AS [Tables Created..]

GO

 

--  using Shaun's syntax (because it doesn't have a database footprint),

--  populate SetBuilder

;WITH digits(i) AS (

    SELECT 1.0 AS I UNION ALL SELECT 2.0 UNION ALL SELECT 3.0 UNION ALL

    SELECT 4.0 UNION ALL SELECT 5.0 UNION ALL SELECT 6.0 UNION ALL

    SELECT 7.0 UNION ALL SELECT 8.0 UNION ALL SELECT 9.0 UNION ALL

    SELECT 0.0

),

--      generate 1M rows each with a unique row number, i

sequence(i) AS (

    SELECT d1.i + (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)

      FROM digits AS d1,

           digits AS d2,

           digits AS d3,

           digits AS d4,

           digits AS d5,

           digits AS d6

)

INSERT  SetBuilderDecimal (Id)

SELECT  i+1 -- CTE gives us 0-999999; we want 1-1000000

FROM    sequence

 

--  populate Candidate

INSERT  Candidate (CandidateValue)

SELECT  Id

FROM    SetBuilderDecimal

GO

 

--  now delete the low-hanging fruit from Candidate

--    WE ARE ONLY DONG THIS FOR PERFORMANCE; the algorhytm can handle them

--    NOTE that these fifteen DELETE statements, which encompass the first 15 prime numbers and

--      thier multiples, removes 86.13% of the candidate values

DELETE  Candidate WHERE CandidateValue/2 = CAST(CandidateValue/2 AS int) AND CandidateValue > 2

DELETE  Candidate WHERE CandidateValue/3 = CAST(CandidateValue/3 AS int) AND CandidateValue > 3

DELETE  Candidate WHERE CandidateValue/5 = CAST(CandidateValue/5 AS int) AND CandidateValue > 5

DELETE  Candidate WHERE CandidateValue/7 = CAST(CandidateValue/7 AS int) AND CandidateValue > 7

DELETE  Candidate WHERE CandidateValue/11 = CAST(CandidateValue/11 AS int) AND CandidateValue > 11

DELETE  Candidate WHERE CandidateValue/13 = CAST(CandidateValue/13 AS int) AND CandidateValue > 13

DELETE  Candidate WHERE CandidateValue/17 = CAST(CandidateValue/17 AS int) AND CandidateValue > 17

DELETE  Candidate WHERE CandidateValue/19 = CAST(CandidateValue/19 AS int) AND CandidateValue > 19

DELETE  Candidate WHERE CandidateValue/23 = CAST(CandidateValue/23 AS int) AND CandidateValue > 23

DELETE  Candidate WHERE CandidateValue/29 = CAST(CandidateValue/29 AS int) AND CandidateValue > 29

DELETE  Candidate WHERE CandidateValue/31 = CAST(CandidateValue/31 AS int) AND CandidateValue > 31

DELETE  Candidate WHERE CandidateValue/37 = CAST(CandidateValue/37 AS int) AND CandidateValue > 37

DELETE  Candidate WHERE CandidateValue/41 = CAST(CandidateValue/41 AS int) AND CandidateValue > 41

DELETE  Candidate WHERE CandidateValue/43 = CAST(CandidateValue/43 AS int) AND CandidateValue > 43

DELETE  Candidate WHERE CandidateValue/47 = CAST(CandidateValue/47 AS int) AND CandidateValue > 47

 

--  uncomment this SELECT statement to confirm that we have cut 1M candidates down to 138,760

--  SELECT COUNT(*) FROM Candidate

 

GO

--  log a status message

SELECT GETDATE() AS [Tables Populated..]

GO

 

--  keep some house for what comes..

DECLARE @CandidateValue DECIMAL(12,2),

        @Counter DECIMAL (12,2),

        @IntCandidateValue INT

 

--  initalize a counter

SET     @Counter = 1

 

--  declare a cursor to traverse the Candidate table

DECLARE TraverseCandidateTable CURSOR FAST_FORWARD READ_ONLY FOR

SELECT  CandidateValue,

        CAST(CandidateValue AS INT)

FROM    Candidate

ORDER BY CandidateValue

 

--  open the cursor and get the first record

OPEN    TraverseCandidateTable

 

FETCH NEXT

FROM  TraverseCandidateTable

INTO  @CandidateValue,

      @IntCandidateValue

 

--  log a status message

SELECT GETDATE() AS [Cursor Loop Starts..]

 

--  open a transaction

--      (careful transaction control is necessary to avoid rampant growth of the transaction logs)

BEGIN TRAN

 

--  start a loop which will run as long as the cursor finds a record

WHILE @@FETCH_STATUS = 0

BEGIN

 

--  insert the full range of possible factors for the candidate value

--      into the comparison matrix

--      (use the SetBuilderDecimal table to aid this work)

    INSERT  ComparisonMatrix (

        Base,

        Factor

    )

    SELECT  @CandidateValue,

            Id

    FROM    SetBuilderDecimal

    WHERE   Id <= @CandidateValue

 

--  log a database commit anf start a new transaction

--  if it's been 200 transactions since we did so

    IF  @Counter/200.0 = CAST(@Counter/200 AS int)

    BEGIN

        COMMIT TRAN

 

            CHECKPOINT

 

--  log a status message if it's been 1000 transactions since we did so

        IF  @Counter/1000.0 = CAST(@Counter/1000 AS int)

            SELECT GETDATE() AS [One Thousand More..], @Counter AS [Current Count..], @CandidateValue AS [Current Candidate..]

 

        BEGIN TRAN

    END

 

--  get the next candidate value

    FETCH NEXT

    FROM  TraverseCandidateTable

    INTO  @CandidateValue,

          @IntCandidateValue

 

--  increment the final transaction

    SET   @Counter = @Counter + 1

END

 

--  we're out of the loop;

--  we still need to commit the final transaction

COMMIT TRAN

 

--  clost and deallocate the cursor

CLOSE TraverseCandidateTable

 

DEALLOCATE TraverseCandidateTable

 

--  log a status message

SELECT GETDATE() AS [Cursor Loop Complete..]

 

--  generate the list of prime numbers

--      if the SUM(IsQuotientInteger)=2 (where IsQuotientInteger=1 when a particular member of the set

--      is a precise integer factor of the candidate value), then that candidate is a prime number,

--      because IsQuotientInteger will equal 1 for 1 and the candidate itself.  All candidates with

--      additional factors will thus be filtered out by the HAVING cluase.

SELECT  Base

FROM    ComparisonMatrix

GROUP BY Base

HAVING SUM(IsQuotientInteger) = 2

ORDER BY Base
GO

Comments
  • CREATE DATABASE [PrimeNumberTest] ON  PRIMARY (
       NAME        = N'PrimeNumberTest',
       FILENAME    = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\PrimeNumberTest.mdf',
       SIZE        = 3GB,
       MAXSIZE     = UNLIMITED,
       FILEGROWTH  = 1GB )
    LOG ON (
       NAME        = N'PrimeNumberTest_log',
       FILENAME    = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\PrimeNumberTest_log.ldf',
       SIZE        = 5GB,
       MAXSIZE     = 2048GB,
       FILEGROWTH  = 1GB )
    GO
    EXEC dbo.sp_dbcmptlevel @dbname=N'PrimeNumberTest', @new_cmptlevel=90
    GO
    IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
    BEGIN
       EXEC [PrimeNumberTest].[dbo].[sp_fulltext_database] @action = 'disable'
    END
    GO
    ALTER DATABASE [PrimeNumberTest] SET ANSI_NULL_DEFAULT OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET ANSI_NULLS OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET ANSI_PADDING OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET ANSI_WARNINGS OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET ARITHABORT OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET AUTO_CLOSE OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET AUTO_CREATE_STATISTICS ON
    GO
    ALTER DATABASE [PrimeNumberTest] SET AUTO_SHRINK OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET AUTO_UPDATE_STATISTICS ON
    GO
    ALTER DATABASE [PrimeNumberTest] SET CURSOR_CLOSE_ON_COMMIT OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET CURSOR_DEFAULT  GLOBAL
    GO
    ALTER DATABASE [PrimeNumberTest] SET CONCAT_NULL_YIELDS_NULL OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET NUMERIC_ROUNDABORT OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET QUOTED_IDENTIFIER OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET RECURSIVE_TRIGGERS OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET  ENABLE_BROKER
    GO
    ALTER DATABASE [PrimeNumberTest] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET DATE_CORRELATION_OPTIMIZATION OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET TRUSTWORTHY OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET ALLOW_SNAPSHOT_ISOLATION OFF
    GO
    ALTER DATABASE [PrimeNumberTest] SET PARAMETERIZATION SIMPLE
    GO
    ALTER DATABASE [PrimeNumberTest] SET  READ_WRITE
    GO
    ALTER DATABASE [PrimeNumberTest] SET RECOVERY FULL
    GO
    ALTER DATABASE [PrimeNumberTest] SET  MULTI_USER
    GO
    ALTER DATABASE [PrimeNumberTest] SET PAGE_VERIFY CHECKSUM  
    GO
    ALTER DATABASE [PrimeNumberTest] SET DB_CHAINING OFF
    GO

  • Well here is a Sieve of Eratosthenes version, runs in 6 seconds on 2nd run, first run will be around 16 seconds on my laptop (SQL 2000)

    SET NOCOUNT ON


    DECLARE @i INT

    -- Create a 10-digit table
    DECLARE @D TABLE (N INT)
    INSERT INTO @D (N)
    SELECT 0 UNION ALL
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 4

    INSERT INTO @D (N)
    SELECT N+5 FROM @D

    -- build a small sieve table between 2 and 1000
    DECLARE @T TABLE (N INT)
    INSERT INTO @T( N )
    SELECT 1+A.N+10*(B.N+10*C.N)
    FROM @D A, @D B, @D C

    DELETE FROM @T WHERE N = 1

    SET @I = 2
    WHILE @I <= SQRT(1000) BEGIN     DELETE FROM @T WHERE N % @I = 0 AND N > @I
       SET @I = @I + 1
    END

    -- Create large table between 1001 and 1000000
    SELECT A+10*(B+10*(C+10*(D+10*(E+ 10*F)))) AS N
    INTO #P
    FROM
    (    SELECT A.N AS A, B.N AS B, C.N AS C, D.N AS D, E.N AS E, F.N AS F
       FROM @D A, @D B, @D C, @D D, @D E, @D F
       WHERE A.N in (1, 3, 7, 9)  -- Not divisible by 2 or 5
    ) bNum
    WHERE (A+B+C+D+E+F) % 3 <> 0 -- Or 3
       AND (A+3*B+2*C-D-3*E-2*F) % 7 <> 0 -- Or 7
       AND (B-A+D-C+F-E) % 11 <> 0 -- Or 11
       and D|E|F <> 0 -- Don't include the first 1000 numbers,
    --we already have these in the small sieve table
    UNION ALL SELECT 1000000

    -- sieve the big table with smaller one
    SELECT @I = 2
    WHILE @I IS NOT NULL
    BEGIN
       DELETE FROM #P WHERE N% @I = 0
       SELECT @I = MIN(N) FROM @T WHERE N > @I
    END

    -- add primes up to 1000
    INSERT INTO #P SELECT N FROM @T

    -- Here are the results
    --78498 rows
    SELECT  * FROM #P ORDER BY 1

    drop table #P
    go

  • No sane person would even consider using SQL Server to construct a list of prime numbers. So just to...

  • Let me know how many seconds this one takes?

    SET NOCOUNT ON


    DECLARE @i INT

    -- Create a 10-digit table
    DECLARE @D TABLE (N INT)
    INSERT INTO @D (N)
    SELECT 0 UNION ALL
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 4

    INSERT INTO @D (N)
    SELECT N+5 FROM @D

    -- build a small sieve table between 2 and 1000
    DECLARE @T TABLE (N INT)
    INSERT INTO @T( N )
    SELECT 1+A.N+10*(B.N+10*C.N)
    FROM @D A, @D B, @D C

    DELETE FROM @T WHERE N = 1

    SET @I = 2
    WHILE @I <= SQRT(1000) BEGIN     DELETE FROM @T WHERE N % @I = 0 AND N > @I
       SET @I = @I + 1
    END

    -- Create large table between 1001 and 1000000
    SELECT A+10*(B+10*(C+10*(D+10*(E+ 10*F)))) AS N
    INTO #P
    FROM
    (    SELECT A.N AS A, B.N AS B, C.N AS C, D.N AS D, E.N AS E, F.N AS F
       FROM @D A, @D B, @D C, @D D, @D E, @D F
       WHERE A.N in (1, 3, 7, 9)  -- Not divisible by 2 or 5
    ) blah
    WHERE (A+B+C+D+E+F) % 3 <> 0 -- Or 3
       AND (A+3*B+2*C-D-3*E-2*F) % 7 <> 0 -- Or 7
       AND (B-A+D-C+F-E) % 11 <> 0 -- Or 11
       AND D|E|F <> 0 -- Don't include the first 1000 numbers,
    --we already have these in the small sieve table
    UNION ALL SELECT 1000000

    -- sieve the big table with smaller one
    SELECT @I = 2
    WHILE @I IS NOT NULL
    BEGIN
       DELETE FROM #P WHERE N% @I = 0
       SELECT @I = MIN(N) FROM @T WHERE N > @I
    END

    -- add primes up to 1000
    INSERT INTO #P SELECT N FROM @T

    -- Here are the results
    --78498 rows
    SELECT  * FROM #P ORDER BY 1

    drop table #P
    go

  • Well, folks, there's a new sheriff in town..&amp;nbsp; this is a continuation of a conversation here and...

  • Our blogging site is going to go dark in an hour or so for a software upgrade, so I won't be able to...

  • One more PDA-incompatible post: lessons learned from &quot;discredited&quot; prime number syntax

  • Denis' Prime Number Challenge just won't die.&amp;nbsp; I think this topic has spurred more dialog than any&amp;nbsp;other...

  • One more PDA-incompatible post: lessons learned from "discredited" prime number syntax

  • No less a figure than Louis Davidson has weighed in on the Prime Number Challenge . I left a comment

  • Hi...Think simple ...here code for find Prime Number in sql server : declare @a int declare @b int select @a = 10 select @b = 10000 while @a < @b BEGIN If ( (convert(float, @a/2.0) <> (@a/2)) AND (convert(float, @a/3.0) <> @a/3) AND (convert(float, @a/5.0) <> @a/5) AND (convert(float, @a/7.0) <> @a/7)) print convert(varchar(5),@a) + ' is a Prime Number' Set @a = @a + 1 END

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment